Academia.eduAcademia.edu

Anthropomorphic visual sensors

2006, Encyclopedia of Sensors

The tentative of reproducing a biological eye with artificial electronic devices has always had a major drawback. In fact there is an important difference between how a human eye sees the world and how a standard video camera does: while the common visual sensors generally have constant resolution on each part of the image, in their biological counterparts the picture elements are arranged in order to get a very high amount of information in the central part of the field of view (the so called fovea) and a gradually decreasing density of ...

Encyclopedia of SENSORS www.aspbs.com/eos Anthropomorphic Visual Sensors Fabio Berton, Giulio Sandini, Giorgio Metta LIRA-Lab, DIST, Università di Genova, Genova, Italy CONTENTS 1. 2. 3. 4. 5. Introduction and Motivations The Log-polar Mapping The Log-Polar Sensor Mathematical Properties of the Log-Polar Transform Applications Glossary References 1. INTRODUCTION AND MOTIVATIONS The tentative of reproducing a biological eye with artificial electronic devices has always had a major drawback. In fact there is an important difference between how a human eye sees the world and how a standard video camera does: while the common visual sensors generally have constant resolution on each part of the image, in their biological counterparts the picture elements are arranged in order to get a very high amount of information in the central part of the field of view (the so called fovea) and a gradually decreasing density of photoreceptors while approaching the borders of the receptive field. There is a practical motivation behind this evolutional choice: a human being needs both a high resolution in order to distinguish between the small details of a particular object for fine movements (the human eye has actually a maximum resolution of about 1/60 degrees) and, at the same time, a large enough field of view (i.e., about 150 degrees horizontally and about 120 vertically for the human eye) so to have a sufficient perception of the surrounding environment. With a constant resolution array of sensors, this two constraints would have increased the total number of photoreceptors to an incredibly high value, and the consequence of this would have been the need of other unrealistic features such as an optic nerve having a diameter of few centimeters (the actual human optic nerve diameter is about 1.5 mm), in order to transfer this amount of data, and a much bigger brain (weighting about 2300 kg, compared to about 1.4 kg of our brain) in order to process all this information, not considering the huge power requirements of such a big brain. ISBN: 1-58883-056-X/$50.00 Copyright © 2006 by American Scientific Publishers All rights of reproduction in any form reserved. Since a zooming capability would have implied a renounce to the simultaneity of the two features, the evolution has answered to the question on how to optimally arrange a given number of photoreceptors over a finite small surface. A lot of different eyes evolved with the disposition of the photoreceptors adapted to the particular niche. Examples of this diversity can be found in the eyes of insects (see, for example, [1] for a review) and in those of some birds that have two foveal regions to allow simultaneous flying and hunting ([2, 3]). In the human eye (Fig. 1.1) we have a very high density of cones (the color sensitive photoreceptors) in the central part of the retina and a decreasing density when we move towards the periphery. The second kind of receptors (the rods, sensitive to luminance), are absent in fovea, but they have a similar spatial distribution. In fact the cone density in the foveola (the central part of the fovea) is estimated at about 150–180,000 cones/mm2 (see Fig. 1.1). Towards the retinal periphery, cone density decreases from 6000 cones/mm2 at a distance of 1.5 mm from the fovea to 2500 cells/mm2 close to the ora serrata (the extremity of the optic part of the retina, marking the limits of the percipient portion of the membrane). Rod density peaks at 150,000 rods/mm2 at a distance of about 3–5 mm from the foveola. Cone diameter increases from the center (3.3 m at a distance of 40 m from the foveola) towards the periphery (about 10 m). Rod diameter increases from 3 m at the area with the highest rod density to 5.5 m in the periphery [4]. Since this sensor arrangement has been proven by the evolution to be an efficient one, we tried to investigate how this higher efficiency could be translated in the world of artificial vision. From the visual processing point of view we asked on one hand whether the morphology of the visual sensor facilitates particular sensorimotor coordination strategies, and on the other, how vision determines and shapes the acquisition of behaviors that are not necessarily purely visual in nature. Also in this case we must note that eyes and motor behaviors coevolved: it does not make sense to have a fovea if the eyes cannot be swiftly moved over possible regions of interest (active vision). Humans developed a sophisticated oculomotor apparatus that includes saccadic movements, smooth tracking, vergence, and various combinations of retinal and extra-retinal signals to maintain Encyclopedia of Sensors Edited by C. A. Grimes, E. C. Dickey, and M. V. Pishko Volume X: Pages (1–16) 2 Anthropomorphic Visual Sensors The log-polar transform is defined by the function [9]: Cones and Rods Density Cones w = f z = log z (2.1) where both z and w are complex variables. While Rods z = x + iy = rcos + i · sin  (2.2) Blind Spot is the representation of a point in the cartesian domain, (2.3) w = z + i · z Angular Distance from Fovea Figure 1.1. Cones and Rods Density—The density of the cones is maximum in the center of the fovea, and rapidly decreases towards the periphery. The rods are absent in fovea, but have a similar decreasing property. vision efficient in a wide variety of situations (see [5] for a review). Since the coupling of this space variant structure (Fig. 1.2) with an active vision system gives the capability of seeing the regions of interest always with the best available quality, it is expected that the transposition of this principles to an artificial eye would be extremely efficient as well. This addresses the question of why it might be worth copying from biology and which are the motivations for pursuing the realization of biologically inspired artifacts. How this has been done is presented in the following sections where we shall talk about the development of a retina like camera. Examples of applications are also discussed in the field of image transmission and robotics. The image transmission problem resembles the issue of the limitation of bandwidth/size of the optic nerve discussed above. The limitations in the case of autonomous robots are in terms of computational resources and power consumption. 2. THE LOG-POLAR MAPPING While the arrangement of photoreceptors on the human retina is quite complex and irregular, there are some approximations that are able to reproduce well their disposition. The simple mathematical mapping that best fits this space variant sensor structure is the log-polar mapping, that is also called, for this reason, retina like mapping [6–8]. represents the position of a point on the log-polar plane. The equation (2.1) can also be written as:  r  = log r (2.4) r  = h · Equations (2.4) point out the log-arithmic and the polar structure of the mapping. In fact is directly linked to log r and  is proportional to when considering a traditional polar coordinates system defined by:    r = x2 + y 2 (2.5)  = arctan y x In the Fig. (2.1)a it is possible to see how the regions of the rectangular grid in the log-polar domain are arranged, in cartesian coordinates x y, in concentric rings whose width is proportional to their distance from the origin of the mapping. Each ring is then divided in a certain number of receptive fields, each one corresponding to a pixel in the cortical image 2.1b. The receptive fields that lie on a radius are mapped on vertical lines on the plane (  ), while those lying on concentric circumferences centered on the origin are mapped on horizontal lines. The origin of the plane (x y) has no corresponding points on the plane (  ), because of the singularity that the complex logarithm function has in this point and, theoretically, an infinite number of concentric rings exist below any radius having a finite size. So a more y θ x ρ (a) (a) (b) (b) Figure 1.2. Biological eye: (a) The real world (courtesy Tom Dorsey/ Salina Journal) and (b) how the human eye perceives it. Figure 2.1. Log-polar Transform: (a) The grid in cartesian coordinates and (b) Its transform. The areas marked in gray are each other’s transformation. 3 Anthropomorphic Visual Sensors common expression for the log-polar transform is the following:  r  r  = log r0  r  = h · (2.6) The logarithm base  is determined by the number of pixels we want to lay on one ring, and by their shape. While the qualitative shape of a pixel is fixed (it is the intersection between an annulus and a circular sector, see Fig. 2.2, its proportions may vary. If we require the photoreceptor’s shape to be approximately square, we can state that the length of the segment BC should be equal to the length of the arc DC or of the arc AB. In the first case the base of the logarithm will be: 2 + N = N area, while keeping the total number of log-polar pixels, and the previously mentioned shift constant r0 :  r  r  = k · log r0 (2.9)  r  = h · The equations needed to perform a log-polar transform (Eq. 2.10), and its anti-transform (2.11) then will be:   x2 + y 2    x y = k · log r0 (2.10)  y   x y = h · arctan x and:     x   = r0 ·   h    /k  y    = r0 ·  · sin h (2.7) while in the second case we will have: N = N − 2 (2.8) Usually, when handling real situations, due to technological constraints, the size of a pixel in an actual sensor cannot boundlessly decrease, so the mapping has to stop when the size of the receptive field approximates the size of the smallest pixel allowed by the used technology. The consequence is that there is an empty circular area in the center of the field of view where the space variant sensor is blind. This lack of sensibility can be avoided by filling this area with uniformly spaced photoreceptors, whose size is equal to the size of the smallest pixel in the logarithmic part of the sensor. The structure of the whole sensor then will have a central part, i.e., the fovea, where the polar structure is still preserved and the space variant resolution (the logarithmic decreasing) is lost. This choice automatically determines the value of the shift factor r0 in (2.6). In fact, if we decided to arrange i rings in fovea (with indices 0     i − 1), we will have to assign an index i to the first logarithmic ring. Since varying the value of r0 we are able to overlap any chosen ring of the logarithmic part of the sensor to the ith ring of the fovea, r0 is then determined. Then the most general equation which describes the logpolar transform introduces in equation (2.6) a proportional constant k in order to scale the mapping to cover a desired A B D Figure 2.2. Log-polar pixel. C /k · cos (2.11) Both the previous equations are valid for the logarithmic part of the mapping, while in fovea the following equations are used:    x y = k · x2 + y 2 (2.12)  x y = h · arctan y x and:     x   =  k h     y   = · sin k h · cos (2.13) Generally the choice of the various parameters, especially when handling the case of a discrete log-polar transform, is crucial. It deeply affects, in fact, the quality of the images, the compression factor, the size and the relevance of various artifacts and false colors [10]. Peters [11] has investigated a method in order to optimally choose the parameters of the mapping. 3. THE LOG-POLAR SENSOR In the 1980s various researchers have begun to transfer this efficient idea from the biological world to the artificial one, so to take advantage of the reduced cost of the acquisition of the images. As we stated before, this means lower power consumption, lower number of pixels to be acquired and shorter processing time. Moreover such a family of sensors is important for an artificial being that mimics a complex system such as the human body. Various approaches have been tried in order to build a visual system capable to produce foveated images, but each choice presented some drawbacks. The first, and more intuitive solution has been a standard CCD video camera connected to a workstation with a frame grabber and a software, which was able to perform the log-polar transform. Unfortunately the technology that was available at that time did 4 not allow a software simulation of a log-polar mapping or, at best, when this was possible, its performances were not comparable with those of a solid-state device. So the next step has been the design of a dedicated hardware (see, for example, [12–16]) that was able to speed up the remapping process but the maximum resolution allowed was limited by the resolution of the traditional camera, not considering the additional cost of the electronic board and the need of an acquisition of a number of pixel which was much higher than the one in the output image. Another approach that permitted to get a foveated image has been the use of distorting lenses which were able to enlarge the central part of the image while keeping the peripheral region unchanged [17, 18], but in order to get an actual log-polar transform some additional operations were needed. So it has been decided to design an implementation in silicon of a biologically inspired completely new visual system. Besides our realizations a few other attempts have been reported in the literature on the implementation of solidstate retina like sensors [19–21]. So far we are not aware of any commercial device, besides those described here, that have been realized based on log-polar retina like sensors. Anthropomorphic Visual Sensors (a) 3.1. The 2000 pixels CCD Sensor Our first implementation of a solid-state foveated retina like sensor has been realized in the beginning of the 1990s using a 1.5 m CCD technology [22]. At that time it was the state of the art technology and it allowed a size of the smallest possible pixel (i.e., a pixel in the foveal part of the image) of about 30 m, while the diameter of the whole sensor, for practical reasons was limited to 9.4 mm. This sensor was composed of 30 rings, and each ring was covered by 64 pixels, for a total of 1920 pixels in the log-polar part of the sensor (Fig. (3.1)a), to which we still have to add 102 more pixels covering the fovea, for a total of 2022 elements. The fovea was covered by a uniform grid of square pixels arranged on a pseudo lozenge, roughly an 11 × 11 cartesian grid with missing corners and one diagonal, like in Fig. (3.1)b. Since the polar structure was not preserved in the fovea, a major discontinuity was present on the border between the two regions of the sensor. Since the size of the largest pixel in this first CCD implementation was about 412 m, the ratio between the largest and the smallest pixels (R) was about 13.7. This parameter describes the amount of “space variancy” of the sensor and, of course, it is equal to 1 in the standard cartesian sensors with constant resolution. Another important parameter in space variant sensors is the ratio between the total area of the sensor and the area of the smallest pixel (or, alternatively, between the diameter of a pixel and the diameter of the whole sensor). Rojer and Schwartz [14] defined this value as Q, as a measure unit of the spatial quality of a space variant sensor. In order to have a reference term, consider that Q is obviously equal to the size of the sensor (measured in pixels) in a constant resolution array, while it can have values very close to 10,000 in the human retina. The importance of this parameter can be understood by observing that its value represents the square root of the number of pixels we would need to cover the space variant sensor with a constant resolution grid having (b) Figure 3.1. 2000 pixels CCD log-polar sensor: (a) Picture of the whole CCD log-polar sensor. (b) Detail of the fovea. the same maximum resolution of the log-polar sensor (and the same field of view). Rojer and Schwartz also proved that Q exponentially increases with the total number of pixels in the log-polar sensor, so the addition of few more rings gives a much better value of Q. For the retina like CCD sensor, the parameter Q is equal to about 300, meaning that if we want to simulate electronically a log-polar camera starting from a traditional one, the latter should have a sensor which is able to acquire at least a 300 × 300 square image, in order to obtain the same 5 Anthropomorphic Visual Sensors amount of information that the solid state retinical sensor could obtain directly. This first sensor has been the first solid state device of this kind in the world, but presented some drawbacks mostly related to the use of the CCD technology itself, like the difficulty of properly addressing the pixels, which caused the presence of some blind areas on the sensor (one in the fovea, along a diameter and a circular sector in the periphery, which is about 14 degrees wide (or 2.5 pixels), both shown in Fig. 3.1). 3.2. The 8000 pixels CMOS Sensor The next generation of the sensor had some very important new features. First of all, the evolution of the technology allowed the construction of a much smaller pixel, and consequently a significantly greater number of photoreceptors could be fitted in a sensor having roughly the same size of the CCD version. Moreover, in order to avoid some of the problems that afflicted the previous version, we decided to move to the CMOS technology. Since the addressability in this new sensor was much simpler, no blind areas were present on the surface. The chosen pixel for the 8000 points sensor was the FUGA model from IMEC (now Fill Factory), in Loewen, Belgium. The pixels distinguished themselves from classical CCD or CMOS sensors because of their random addressability and logarithmic photoelectric transfer. The random addressability allowed, in theory, to read just some previously chosen parts of the sensor, while the logarithmic behavior allowed the sensor to correctly acquire images even in extreme light conditions. The logarithmic response yielded a dynamic range beyond six decades (120 dB), by log-compression of the input optical power scale onto a linear output voltage scale: the new pixel could view scenes with vastly different luminances in the same image, without even the need of setting an exposure time. It can be useful to note that this behavior mimics very well that of the eyes in similar conditions. The major drawback of this technology was the introduction of the so-called fixed pattern noise (FPN), which is common to all the CMOS visual sensors. The continuous-time readout of a logarithmic pixel implied the impossibility of the removal of static pixel-to-pixel offsets. As a result, the raw image output of such a sensor contained a large overall non-uniformity, often up to 50 or 100% of the actual acquired signal itself. However, things were not dramatic, since the FPN is almost static in time, so it can be nearly completely removed by a simple first order correction. The second new feature was the preservation of the polar structure in the fovea. Although the number of pixels per ring was not constant anymore, the presence of a polar arrangement minimized the effect of the discontinuity foveaperiphery. The fovea was structured with a central pixel, then a ring with 4 pixels, one with 8, 2 with 16, 5 with 32 and 10 with 64 pixels per ring, for a total number of 845 pixels on 20 rings. Starting from the 21st ring, the logarithmic increasing was applied (Fig. 3.3). The third macroscopic difference compared to the CCD sensor was that this time a color version of the chip had been produced. Since a pixel, normally, is sensitive to just one wavelength, the color had to be reconstructed for each (a) ⇔ (b) (c) Figure 3.2. 2000 pixels CCD Log-polar Sensor Simulation: (a) A standard cartesian image. (b) Its log-polar transform performed by this sensor. (c) The remapped image. Please note that in this figure, and in the next Figs. 3.4 and 3.6, the log-polar image and its transform are not in scale. In all these images the fovea is not displayed. photosite by interpolating the outputs of the neighboring pixels. This is a common operation, needed on standard constant resolution arrays as well, and various patterns have been investigated in order to minimize the appearance of false colors, but when dealing with the large pixels close to (a) (b) Figure 3.3. CMOS 8000 pixel Log-polar sensor: (a) Picture of the 1st version of the CMOS log-polar sensor. (b) Detail of the fovea. 6 Anthropomorphic Visual Sensors ⇔ (a) (a) (b) Figure 3.5. CMOS 33000 pixel Log-polar sensor: (a) Picture of the 2nd version of the CMOS log-polar sensor. (b) Detail of the fovea. (b) Figure 3.4. 8000 pixels CMOS Log-polar sensor simulation: (a) The log-polar transform of the image 3.2a performed by this sensor. (b) The remapped image. The main novelty introduced by this sensor has been the structure of the fovea, which was, this time, completely filled by pixels. The 42 rings inside the fovea, while still having a variable number of pixel each, had a constant decrease of this parameter: so, not considering the ring 0, which was just a single pixel, the other rings had a number of pixel which was proportional to their distance from the center of the sensor (Fig. 3.5). A minor change in the layout was the adoption of a pseudo triangular tessellation, in order to minimize the artifacts introduced by the low spatial frequency of the periphery. It means that each even ring has been rotated by half a pixel (about 0.7 degrees) respect to the odd rings. This had the consequence that the average distance of a single point from the closest red, green and blue pixel was inferior to the one we had with the square tessellation, then it allowed a more accurate color reconstruction. Since the parameter Q in this version is about 1100, for the first time the Log-polar image acquired by the camera was “better” than the images produced by the standard “off the shelf” video cameras available at that time. Better means the external border (low spatial frequency), and when the image presents very high spatial frequencies, it happens that adjoining pixels “see” different elements of the image, and the consequence is that the color components used for the reconstruction belong to uncorrelated object and that causes the false colors. 3.3. The 33000 pixels CMOS Sensor In the late 1990s the progress in the field of silicon manufacturing allowed us to design a new chip using a 0.35 m technology. This project had been developed within a EU funded research project called SVAVISCA. The goal of the project was to realize, besides the improved version of the sensor, a micro camera with special purpose lens allowing a 140 degrees field of view. The miniaturization of the camera was possible because some of the electronics required to drive the sensor as well as the A/D converter were included in the chip itself. Table 1. A comparison between three generations of log-polar sensors. Sensor Version CCD CMOS 8k CMOS 33k Sensor Version CCD CMOS 8k CMOS 33k Total Number of Pixels 2022 8013 33193 Rings in Fovea Pixels in Fovea 102 845 5473 Rings in Periphery — 20 42 30 56 110 Sensor Version Ø of the Sensor Size of the Smallest Pixel CCD CMOS 8k CMOS 33k 9400 m 8100 m 7100 m 30 m 14 m 6.5 m Pixels in Periphery 1920 7168 27720 Pixels in Periphery 1920 7168 27720 R 137 14 17 Total Number of Rings 30 76 152 Pixels per Ring 64 128 252 Angular Amplitude  5.413 2.812 1.428 Logarithm Base 1094 1049 102337 Q Technology Used Radius of the Fovea 300 600 1100 1.5 m 0.7 m 0.35 m 317 m 285 m 273 m 7 Anthropomorphic Visual Sensors where z is a complex number in the cartesian domain that is z = x + jy, and w is a complex number in the log-polar domain which is w = + j = w · ej·argw , then we get:   w = log z · ej·argz = log z + j · argz ⇔ (4.2) If we set:  (a) (b) Figure 3.6. 33000 pixels CMOS Log-polar sensor simulation: (a) The log-polar transform of the image 3.2a performed by this sensor. (b) The remapped image. Although the remapped images are not in scale, the log-polar ones are in scale between them. that the remapped log-polar image had a better maximum resolution than its cartesian counterpart, but using just about one thirthieth of the pixels. This allowed sending video streams on low bandwidth channels at a very high frame rate, using standard existing compression algorithms. It was even possible to transmit a video stream over a GSM cell phone channel, which is a 9600 bit per second channel, achieving a frame rate between 1 and 2 frames per second. 3.4. A Comparison Between the Log-Polar Sensors In order to have a better understanding of the evolution of the log-polar sensor, it is useful to compare the main characteristics of the three versions. The increasing of the quality of the image is evident from the Table 1, and it is shown in the simulated images in Figs. 3.2, 3.4, and 3.6: while the numbers in the clock are completely unreadable with the 2000 pixels sensor, they are much more defined in the last versions. 4. MATHEMATICAL PROPERTIES OF THE LOG-POLAR TRANSFORM The log-polar transform is not only the best compromise between biological motivation and simplicity of the mapping, but it also presents some interesting mathematical features which can be used for a more efficient implementation of many image processing algorithms. 4.1. Conformity The log-polar mapping is conformal. A conformal mapping, also known as conformal transformation, angle-preserving transformation, or biholomorphic map, is a transformation that preserves local angles [23]. An analytic function is conformal at any point where it has a nonzero derivative. Conversely, any conformal mapping of a complex variable that has continuous partial derivatives is analytic. The demonstration of the angle preserving property comes from the fact that if we consider a complex function: w = f z = log z (4.1) w = w − w0 (4.3) z = z − z0 and we consider the Taylor expansion of f z centered in z0 , then we get:  f z = f z0  + f n z · zn n! n=1  ⇒ f z − f z0  = f n z · zn = w (4.4) n! n=1 If we stop at the first order approximation, then: w ≈ f ′ z0  · z when f ′ z0  = 0 ⇒ 1 =0 z0 (4.5) So, when  → 0, argw = argz + argf ′ z0 , and if we set argz =  and argf ′ z0  = , then argw =  + . This means that the segments in the cartesian plane are just rotated (locally) by an angle  respect to the same segments in the log-polar domain. Furthermore the segment z is scaled in length by f ′ z0  = 1/z0 . The preservation of angles comes consequently, in fact, if we set:  z1 = 1 (4.6) z2 = 2 then: argw2  − argw1  = 2 +  − 1 +  = argz2  − argz1  (4.7) So, when the derivative in z0 is not zero, which is always true for the logarithmic function, the angles are preserved (Fig. 4.1) and then the mapping is conformal. The preservation of angles implies the preservation of the proximity (i.e., a couple of pixels which are close in the cartesian domain are still close in the log-polar one if we do not consider the discontinuities that occur when  = 0,  = 2 and = 0. These two peculiarities have the consequence that any algorithm involving local operators can be applied to the log-polar image without any significant change (angle detection, edge detection, compression etc.) [8]. In order to avoid the problems introduced by these discontinuities, a graph theory based approach has been investigated, defining a connectivity graph matching the morphology of the sensor [15, 24]. 8 Anthropomorphic Visual Sensors ⇔ ⇔ y θ ρ x y θ x (a) (b) ρ (a) (b) Figure 4.1. Angle Preservation: The angles are locally preserved after a log-polar Transformation. (a) Cartesian domain. (b) log-polar domain. Please note that both images a and b are particulars, so the origin of the mapping falls outside of the cartesian image. ⇔ 4.2. Scale Change One interesting and useful property of the log-polar transform is its invariancy to scale change. A pure scale change in the cartesian domain can be seen as a transformation where all the vectors representing each pixel of an object are transformed in another set of vectors each proportional by a common constant k to its pre-transformation counterpart. A scale change in the cartesian domain referred to the center of the log-polar mapping is then denoted by: (c) (d) Figure 4.2. Scale Change: A pure scale change referred to the origin of the mapping, with no translational components, becomes a pure translation after the log-polar transform. (a) Cartesian domain. (b) log-polar domain. (c), (d) Enlargement of the shaded areas respectively in (a) and (b). w = logkz ⇒ w = logkz · ej·argkz  = log z + log k + j argz (4.8) with k ∈ ℜ+ . This is equal to logz + K, with the constant K = log k, which is a pure translation along the axis in the log-polar domain (which is vertical in our representation, Fig. 4.2). The invariance to scale change can be very important and useful in applications where the camera moves along its optical axis, such as the time to impact detection in a mobile vehicle. ⇔ y θ ρ x (a) (b) 4.3. Rotation Another property is the invariancy to rotation. A pure rotation in the cartesian domain can be seen as a transformation where all the vectors representing each pixel of an object are transformed in another set of vectors whose phase is increased by a common constant compared to its pre-transformation counterpart. A rotation in the cartesian domain referred to the center of the log-polar mapping is then denoted by: ⇔ (c) logz · ej·argz+  = log z + jargz +  (4.9) this is equal to logz + 0 , with the constant 0 ∈ Im, which is a pure translation along the  axis in the log-polar domain (horizontal in our representation, Fig. 4.3). (d) Figure 4.3. Rotation: A pure rotation referred to the origin of the mapping, with no translational components, becomes a pure translation after the log-polar transform. (a) Cartesian domain. (b) log-polar domain. (c), (d)Enlargement of the shaded areas respectively in (a) and (b). 9 Anthropomorphic Visual Sensors 4.4. Translations While a translation in the cartesian space can be performed without any change in the shape of an object, this is no longer true in the log-polar domain. In fact, given a point P x0  y0 , its transform is PLP  0 0  = PLP 1 log 2 x02 +y02 r0 y0 x0 arctan (4.10) The block diagram of the algorithm is described in Fig. 4.5: If we consider two images ax y and bx y, where b is a rotated, scaled and translated copy of a: bxy = ak"cosx +siny#−x0 k"−sinx +cosy#−y0  (4.12)  is the rotation angle, k is the scale factor and x0 and y0 are the translation offsets. The Fourier transforms Au v and Bu v of a and b respectively are related by: this point, after a translation, becomes: P x0 + x y0 + x = PLP 1 x0 + x2 + y0 + y2 y + y  arctan 0 log 2 r0 x0 + x = PLP  0 +  x0 + x y0 + x 0 +  x0 + x y0 + x (4.11) Since  and  are non-linear functions, it follows that a translation in the cartesian plane becomes a deformation in the log-polar one, Fig. 4.4. 4.5. The Fourier-Mellin Transform We have seen that the log-polar transform translates scale changes and rotations into vertical and horizontal translations. This property has one important application in the Fourier-Mellin transform [25]. One of the most important properties of the Fourier transform is its magnitude invariancy to translations. The idea is to take advantage from the combination between the properties of the Fourier and logpolar transforms in order to get a tool that is invariant to translations, scale changes and rotations. Buv = e−j b uv k2    ucos+vsin −usin+vcos   · A   k k (4.13) where b u v is the spectral phase of the image bx y. This phase depends on the rotation, translation, and scale change, but the spectral magnitude (4.14)   u cos  + v sin  −u sin  + v cos   1   Bu v = 2 · A  k k k (4.14) is invariant for translations. Equation (4.14) shows that a rotation of image ax y rotates the spectral magnitude by the same angle  and that a scale change of k scales the spectral magnitude by k−1 . However at the spectral origin u = 0 v = 0 there is no change to scale change or rotation. Rotation and scale change can thus be decoupled around this spectral origin by defining the spectral magnitudes of a and b in log-polar coordinates, obtaining: BLP    = k2 ALP  − (  −  where = logr and ( = logk, while r and  are the usual polar coordinates. Hence an image rotation () shifts the image along the angular axis, and a scale change (k) is reduced to a shift along the radial axis and magnifies the intensity by a constant k2 . ⇔ y θ ρ x (a) Input Image (b) FFT (magnitude) ⇔ Cartesian to log-polar Mellin Transform (c) (4.15) (d) Figure 4.4. Translation: A pure translation implies a deformation of the object after the log-polar transform. (a) Cartesian domain. (b) log-polar domain. (c), (d) Enlargement of the shaded areas respectively in (a) and (b). Output Image Figure 4.5. The Fourier-Mellin Transform. 10 Anthropomorphic Visual Sensors This leads to both rotation and scaling now being simple translations, so that taking a Fourier transform of this logpolar representation reduces these effects to phase shifts, so that the magnitudes of the two images are the same. This is known as a Fourier-Mellin transform and can be used to compare a single template against an unknown image which will be matched even if it has undergone rotation, scaling or translation. the equation becomes: 4.6. Straight Line Invariancy A generic straight line in the cartesian plane can be overlapped to any other one performing a linear combination of a rotation and a translation. If we note that a translation for a straight line of infinite length and zero width can also be seen as a scaling, then the remapped straight line does not change its shape in the log-polar domain. If we consider a generic straight line in the cartesian domain and its corresponding transform: f1 x = x + ) ⇔ f1LP  = log ) r0 sin  −  cos  (4.16) we can easily see that another generic straight line is: f2 x = ′ x + )′ ⇔ f2LP  = log )′ r0 sin  − ′ cos  (4.17) Now we perform a pure rotation, so we set the following values: f1LP  = w − log "cos − v# (4.23) where the point P = w v is the point on the line which is the closest to the origin of the log-polar mapping. This point is unique and uniquely defines the line itself. This feature is extremely important in various fields. Since it makes easier the detection of straight lines, it can be used in object recognition application, when the object of interest has straight borders. Bishay [26] has taken advantage of this property in order to detect objects edges with an unknown orientation in an indoor scene. Another application where the invariancy of the straight lines is helpful is, in stereo vision, the detection of the epipolar lines in a stereo pair [27]. 4.7. Circumference Invariancy In general, the shape of a circumference is not invariant in the log-polar domain, but it is interesting to note that if we change the sign of the equation of a generic straight line that we saw in the previous section: fLP  = − log  cos 0 + sin 0  = cos 0 +  sin 0 ) )′ = cos 0 +  sin 0 ′ (4.18) = log ) r0 sin  −  cos  r0 sin  −  cos  ) (4.24) and then we transform it back to the cartesian domain, we get: Then we get: f2 x = ′ x + )′ ⇔ f2LP  = log This is a pure translation of the original straight-line function, translated by 0 horizontally and by −* vertically, Fig. 4.6. It is interesting to note that if we set in (4.16):  −1  v = cot − (4.22) ) · sin v  w = log r0 )′ r0 sin  − ′ cos  f x y = )x2 + )y 2 + r02 x + r02 y (4.25) (4.19) = f1LP  − 0  which is a translation along the  axis. Then, considering a pure translation: f3 x = ′ x + )′ + k ⇔ f2LP  = log )′ + k r0 sin  − ′ cos  ⇔ (4.20) since ′)′ and k are constants, we can set another constant in order to get: )′ + k = )′  = ) )+k ′ The equation (4.20) then becomes: y θ ′ f3 x = ′ x + )′ ⇔ f2LP  = log ) r0 sin  − ′ cos  = f1LP  − 0  + * with * = log . (4.21) x (a) ρ (b) Figure 4.6. Straight Line Invariancy: The shape of a straight line after a log-polar transformation is invariant to its position and orientation. (a) Cartesian domain. (b) Log-polar domain. 11 Anthropomorphic Visual Sensors which is the equation of a circumference passing through the center of the mapping, centered in: Cx = r02  , 2) Cy = −r02 2) (4.26) with radius: R= r02  +1 2) ⇔ (4.27) as shown in Fig. 4.7. Another significant family of circumferences is the set of all the circles centered on the origin of the mapping. Since is constant for every value of , if we transform the equation: y θ ρ x (a) 2 2 f x y = x + y − R 2 (4.28) in log-polar coordinates, we get: f    = log R2 r0 (b) Figure 4.8. Circumference Centered in the origin: The shape of a circumference, centered in the origin of the mapping, after a log-polar transformation is a horizontal straight line. (a) Cartesian domain. (b) Log-polar domain. (4.29) Obviously, this is a constant expression, which is mapped in a horizontal straight line, as in Fig. 4.7. For details about the straight line and the circumference invariancy see [28]. 4.8. The Log-Polar Hough Transform As we showed before, both straight lines and the circumferences passing through the origin are invariant in the logpolar space. A very useful tool that can be used to detect these families of lines is the Hough transform. The Hough Transform [29] is a method for finding simple shapes (generally straight lines) in an image. It sums the number of pixels elements that are in support of a particular line, defined in a uniformly discretized parametric Hough space. The transform is a map from points in an image to sinusoidal curves in Hough space. In the ideal case, this curve represents the family of straight lines that could pass through the point. The standard form of the transform is the equation of a straight line, parameterized in polar coordinates:  = xi cos  + yi sin  = d1 cos1 −  (4.30) where (xi , yi ) are the pixel cartesian coordinates, (xi , yi ) are the pixel polar coordinates, is the closest distance to the origin, and  the slope of the normal to the line. Using this transform, any edge point that appears in the image votes for all lines that could possibly pass through that point. In this way, if there is a real line in the image, by transforming all of its points to Hough space, we will accumulate a large number of votes for the actual line, and only one for any other line (assuming no noise) in the image. The vast majority of research on the Hough transform has treated sensors that use a uniform distribution of sensing elements. For space-variant sensors various approaches have been tried. Weiman [30] has proposed a log-Hough transform, where equation (4.30) becomes: log  = log  = log d − log 0 + logcos −  0 (4.31) Barnes [31] has investigated the log Hough transform in the real (discrete) case of images acquired by a log-polar sensor. ⇔ 4.9. Spirals y θ x (a) ρ The last geometrical figure with some interesting characteristics is the logarithmic spiral. The parametric equations for such a spiral is: (b) Figure 4.7. Circumference Invariancy: The shape of a circumference, passing through the origin of the mapping, after a log-polar transformation is invariant to its position and orientation. (a) Cartesian domain. (b) Log-polar domain.  x =  cos · e) y =  sin · e) with  and ) arbitrary parameters. (4.32) 12 Anthropomorphic Visual Sensors ⇔ y (a) θ ρ x (a) (b) Figure 4.9. Logarithmic Spiral: Such a spiral, centered in the origin of the mapping, after a log-polar transformation becomes a straight line. (a) Cartesian domain. (b) Log-polar domain. Please note that in our representation we display the set of points 0 + 2k all on 0 . The log-polar transform of (2.32) gives: = log  · e) r0 (4.33) If we set in (4.33): then we get:  a    k1 = ln r 0  1   k2 = ln  = k2 k1 + ) (4.34) (4.35) which is the equation of a straight line (Fig. 4.9). 5. APPLICATIONS 5.1. Panoramic View Traditionally the acquisition of real time panoramic images has been performed by the usage of lenses or mirrors coupled with standard image sensors, but this solution presents the problems of having different resolutions on different part of the resulting images and the need of both the acquisition of a huge amount of pixels and the processing of these acquired pixels. The main objective of the OMNIVIEWS project [32] was to integrate optical, hardware, and software technology for the realization of a smart visual sensor, and to demonstrate its utility in key application areas. In particular the intention was to design and realize a low-cost, miniaturized digital camera acquiring panoramic (360 ) images and performing a useful low-level processing on the incoming stream of images, Fig. 5.1. The solution proposed in OMINVIEWS was to integrate a retina like CMOS visual sensor, with a mirror with a specially designed, matching curvature. This matching, if feasible, provides panoramic images without the (b) Figure 5.1. Panoramic View: (a) Image acquired by an OMNIVIEWS log-polar camera (software simulation). (b) Image acquired by a conventional omnidirectional camera. Note that the image from OMNIVIEWS camera is immediately understandable while the image from a conventional camera requires more that 1.5 million operations to be transformed into something similar with no added advantage. need of computationally intensive processing and/or hardware remapper as required by conventional omnidirectional cameras. Therefore reducing overall cost, size, energy consumption and computational power with respect to the currently used devices. The panoramic images obtained with a log-polar technology are not only equivalent to the ones obtained with conventional devices but also these images can be obtained at no computational cost. For example with our current prototype a panoramic image composed of about 27,000 pixels is obtained by simply reading out the pixels (i.e., 27,000 operations) while with a conventional solution the same image would required more than 1.7 million operations (about 50 times more). Besides that, unlike with a warped traditional image, we get the interesting side effect of a uniform resolution along the entire panoramic image. The guiding principle is to design the profile of the mirror so that if the camera is inserted inside a cylinder, the direct camera output provides an undistorted, constant resolution image of the internal surface of the cylinder. The advantage of such an approach lays in providing the observer with a complete view of its surrounding in one image which can be refreshed at video rate. For technical details about the design of the mirror, in Fig. 5.2, see the OMNIVIEWS project report [33]. The panoramic vision has a lot of different applications, with or without the presence of a space variant sensor but, as we described above, the utilization of a log-polar sensor greatly improves the performances of the system. The most common usage of this technology (there are in fact several commercial products already available based on a camera and a panoramic mirror) is in the field of remote surveillance, where it is very useful to have a complete vision of the whole environment at once, maybe coupled with a motorized standard camera in order to zoom on the objects of interest. Other applications involve biological vessels (panoramic endoscopies) or industrial pipes investigation, where it is important to have a simultaneous view of the whole internal surface of the cylinder, or robotic navigation, where the advantage is represented by the need of just one camera in order to have a representation of the whole surrounding environment [34]. 13 Anthropomorphic Visual Sensors z γ 60± 0.2 φ θ 2(R1) f G (t) h(ρ) 17.80 t F0 3 ± 0.2 15± 0.2 10± 0.5 20± 0.2 M8 ρ ρ (a) d (b) Figure 5.2. Mirror design: (a) Profile of the mirror. (b) The mirror is designed so that vertical resolution of a cylindrical surface is mapped into constant radial resolution in the image plane. 5.2. Robotic Vision The need of real-time image processing speed is particularly relevant in visually guided robotics. It can be achieved both by increasing the computational power and by constraining the amount of data to be processed. We have seen that the log-polar mapping is able to limit the number of pixels while keeping both a wide field of view and a high resolution in the fovea. But this is not the only reason for such a choice: the biological motivation is very important too when dealing with human-like robots. Over the last few years, we have studied how sensorimotor patterns are acquired in a human being. This has been carried out from the unique perspective of implementing the behaviors we wanted to study in artificial systems. The approach we have followed is biologically motivated from at least three perspectives: the morphology of the artificial being should be as close as possible to the human one, i.e., the sensors should approximate their biological counterparts; its physiology should be designed in order to have its control structures and processing modeled after what is known about human perception and motor control; its development, i.e., the acquisition of those sensorimotor patterns, should follow the process of biological development in the first few years of life. The goal has been that of understanding human sensorimotor coordination and cognition rather than building more efficient robots. So, instead of being interested in what a robot is able to do, we are interested in how it does it. In fact some of the design choices might even be questionable from a purely engineering point of view but they are pursued nonetheless because they improved the similarity with, hence the understanding of, biological systems. Along this feat of investigating the human cognitive processes, our group implemented various artifacts both in software (e.g., image processing and machine learning) and hardware (e.g., silicon implementation of the visual system, robotic heads, and robotic bodies). The most part of the work has been carried out on a humanoid robot called Babybot [35, 36] that resembles the human body from the waist up although in a simplified form, Fig. 5.3. It has eighteen degrees of freedom overall distributed between the head, the arm, the torso and the hand. Its sensory system consists of cameras (the eyes), gyroscopes (the vestibular system), microphones (the ears), position sensors at the joints (proprioception) and tactile sensors on the palm of hand. During the year 2004 a new humanoid is being developed. It will have 23 degrees of freedom, it will be smaller and all its parts will be designed with the final goal of having the best possible similarity with their biological counterparts. Investigation touched aspects such as the integration of visual and inertial information [37], and the interaction between vision and spatial hearing [38]. So, it is very important to understand why the visual system of the robot has to be as similar as possible to human vision: we have to find out whether the morphology of the visual sensors can improve particular sensorimotor strategies and how vision can have consequences on the behaviors which are not purely visual in nature. We should note again that the eyes and motor behaviors coevolved: it is useless to have a fovea if the eyes cannot be swiftly moved over possible targets. Humans developed a sophisticated oculomotor apparatus that includes saccadic movements, smooth tracking, vergence, and various combinations of retinal and extra-retinal signals to maintain vision efficient in a wide variety of situations. As stated above, handling with log-polar images can be sometimes uneasy and time consuming, due to the deformation of the plane, but this drawbacks are always balanced Figure 5.3. LIRA-Lab Humanoid: Babybot 14 Anthropomorphic Visual Sensors by the highly reduced number of pixels that have to be processed. Various algorithms have been developed to perform common tasks in log-polar geometry, like vergence control and disparity estimation [39]. In this last case, grossly simplifying, by employing a correlation measure, the algorithm evaluates the similarity of the left and right images for different horizontal shifts. It finally picks the shift relative to the maximum correlation as a measure of the binocular disparity. The log-polar geometry, in this case, weighs differently the pixels in the fovea with respect to those in the periphery. More importance is thus accorded to the object being tracked. Positional information is important but for a few tasks optic flow is a better choice. One example of use of optic flow is for the dynamic control of vergence as in [40]. We implemented a log-polar version of a quite standard algorithm [41]. The details of the implementation can be found in [42]. The estimation of the optic flow requires taking into account the log-polar geometry because it involves non-local operations. 5.3. Video Conferencing, Video Telephony and Image Compression The log-polar transform of an image can be seen as a lossy image compression algorithm, where all the data loss is concentrated where the information should be less attracting for the user. With our family of sensor the compression rate for a single frame is between 30 and 40 to 1, which is (see Table 1) given by: Q2 NLP (5.1) where NLP is the total number of pixels in the log-polar image. As we have seen, some algorithms can still be applied with no relevant change to the log-polar images, so we can process the image, or the video stream with virtually all the compression algorithms commonly available. This impressive compressing factor can be used to send video streams on very narrow band channels. One possible application has been investigated in the past few years with the EU funded project IBIDEM [43]. This project has been inspired by the fact that hearing impairment prevents many people from using normal voice telephones for obvious reasons. A solution to this problem for the hearing impaired is the use of videophones. At that time available videophones working on standard telephone lines (Public Switched Telephone Network) did not meet the dynamic requirements necessary for lip reading, finger spelling and signing. The spatial resolution was also too small. The main objective of IBIDEM was to develop a videophone useful for lip reading by hearing impaired people based on the space variant sensor and using standard telephone lines. The space variant nature of the sensor allowed to have high resolution in the area of interest, lips or fingers, while still maintaining a wide field of view in order to perceive, for example, the facial expression of the interlocutor, but reducing drastically the amount of data to be sent on the line (see Fig. 5.4). After the IBIDEM project, its natural continuation was to test the performances of the log-polar cameras (which have been called Giotto, see Fig. 5.5) on even narrower bandwidths. Extensive experiments on wireless image transmission were conducted with a set-up composed of a remote PC running a web server embedded into an application that acquires images from the a retina like camera and compress them following one of the recommendations for video coding over low bit rate communication line (H.263 in our case). The remote receiving station was a palmtop PC acting as a client connected to the remote server through a dialup GSM connection (9600 baud). Using a standard browser interface the client could connect to the web server, receive the compressed stream, decompress it and display the resulting images on the screen. Due to the low amount of data to be processed and sent on the line, frame rates of up to four images per second could be obtained. The only specialpurpose hardware required is the log-polar camera; coding/decoding and image remapping is done in software on a desktop PC (on the server side), and on the palmtop PC (on the client side). The aspect we wanted to stress in these experiments is the use of off-the-shelf components and the overall physical size of the receiver. This performance in terms of frame rate, image quality and cost cannot clearly be accomplished by using conventional cameras. More recently (within a project called AMOVITE) we started realizing a portable camera that can be connected to the palmtop PC allowing bi-directional image transmission through GSM or GPRS communication lines. The sensor itself is not much different from the one previously described apart from the adoption of a companion chip allowing a much smaller camera. The same principle has been adopted in other experiments of image transmission: Comaniciu [44] has added a face-tracking algorithm to the software log-polar compression for remote surveillance purposes. Weiman [45], combining the log-polar transform with other compression algorithms has reached a compression factor of 1600:1. (a) (b) Figure 5.4. IBIDEM project finger spelling experiment: (a) Log-polar and (b) remapped images. 15 Anthropomorphic Visual Sensors static power dissipation and current only flows when a gate switches in order to charge the parasitic capacitance. REFERENCES Figure 5.5. Giotto Camera. GLOSSARY Active Vision The control of the optics and the mechanical structure of cameras or eyes to simplify the processing for vision. Cone A specialized nerve cell in the retina, which detects color. Field Of View (FOV) An angle that defines how far from the optical axis the view is spread. Fovea A high-resolution area in the retina, usually located in the center of the field of view. Foveola The central part of the fovea. Optical Axis An imaginary line that runs through the focus and center of a lens. Photoreceptor A mechanism that emits an electrical or chemical signal that varies in proportion to the amount of light that strikes it. Receptive Field The portion of the sensory surface where stimuli affect the activity of a sensory neuron. Rod A light-detecting cell in the retina, detects light and movement, but not color. Saccadic Movement A rapid eye movement used to alter eye position within the orbit, causing a rapid adjustment of the fixation point to different positions in the visual world. Vergence The angle between the optical axes of the eyes. Retina The light-sensitive layer of tissue that lines the back of the eyeball, sending visual impulses through the optic nerve to the brain. CCD (Charge-Coupled Device) A semiconductor technology used to build light-sensitive electronic devices such as cameras and image scanners. Such devices may detect either colour or black-and-white. Each CCD chip consists of an array of light-sensitive photocells. The photocell is sensitized by giving it an electrical charge prior to exposure. CMOS (Complementary Metal Oxide Semiconductor) A semiconductor fabrication technology using a combination of n- and p-doped semiconductor material to achieve low power dissipation. Any path through a gate through which current can flow includes both n and p type transistors. Only one type is turned on in any stable state so there is no 1. M. V. Srinivasan and S. Venkatesh, “From Living Eyes to Seeing Machines.” Oxford University Press, Oxford, 1997. 2. P. M. Blough, “Neural Mechanisms of Behavior in the Pigeon.” Plenum Press, New York, 1979. 3. Y. Galifret, Z. Zellforsch 86, 535 (1968). 4. J. B. Jonas, U. Schneider, and G. O. Naumann, Graefes Arch. Clin. Exp. Ophthalmol. 230, 505 (1992). 5. R. H. S. Carpenter, “Movements of the Eyes,” 2nd Edn., Pion Limited, London, 1988. 6. G. Sandini and V. Tagliasco, Comp. Vision Graph. 14, 365 (1980). 7. E. L. Schwartz, Biol. Cybern 37, 63 (1980). 8. C. F. R. Weiman and G. Chaikin, Computer Graphic and Image Processing 11, 197 (1979). 9. E. L. Schwartz, Biol. Cybern 25, 181 (1977). 10. C. F. R. Weiman, Proc. SPIE Vol. 938 (1988). 11. R. A. Peters II, M. Bishay, and T. Rogers, Tech Rep, Intelligent Robotics Laboratory, Vanderbilt Univ, 1996. 12. T. Fisher and R. Juday, Proc. SPIE Vol. 938 (1988). 13. G. Engel, D. N. Greve, J. M. Lubin, and E. L. Schwartz, ICPR (1994). 14. A. Rojer and E. L. Schwartz, ICPR (1990). 15. R. S. Wallace, P. W. Ong, B. B. Bederson, and E. L. Schwartz, Int. J. Comput. Vision 13, 71 (1994). 16. C. F. R. Weiman and R. D. Juday, Proc. SPIE, Vol. 1192 (1990). 17. R. Suematsu and H. Yamada, Trans. of SICE, Vol. 31, 10, 1556 (1995). 18. Y. Kuniyoshi, N. Kita, S. Rougeaux, and T. Suehiro, ACCV (1995). 19. T. Baron, M. D. Levine, and Y. Yeshurun, ICPR (1994). 20. T. Baron, M. D. Levine, V. Hayward, M. Bolduc, and D. A. Grant, Proc. CASI (1995). 21. R. Wodnicki, G. W. Roberts, and M. D. Levine, IEEE J. Solid-State Circuits 32, 8, 1274 (1997). 22. J. Van der Spiegel, G. Kreider, C. Claeys, I. Debusschere, G. Sandini, P. Dario, F. Fantini, P. Bellutti, and G. Soncini, in “Analog VLSI and Neural Network Implementations” (C. Mead and M. Ismail, Eds.), DeKluwer Publ, Boston, 1989. 23. E. W. Weisstein. “Conformal Mapping.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/ConformalMapping.html. 24. L. Grady, Ph.D. Thesis, 2004. 25. R. Schalkoff, “Digital Image Processing and Computer Vision.” Wiley and Sons, New York, 1989. 26. M. Bishay, R. A. Peters II, and K. Kawamura, ICRA (1994). 27. K. Schindler and H. Bischof, ICPR (2004). 28. D. Young, BMVC (2000). 29. Paul V. C. Hough, U.S. Patent No. 3,069,654 (1962). 30. C. F. R. Weiman, Phase I SBIR Final Report, HelpMate Robotics Inc. 1994. 31. N. Barnes, Proc. OMNIVIS’04 Workshop at ECCV (2004). 32. G. Sandini, J. Santos-Victor, T. Pajdla, and F. Berton, IEEE Sensors (2002). 33. S. Gächter, FET Project No: IST–1999–29017 (2001). 34. J. Santos-Victor and A. Bernardino, “Robotics Research, 10th International Symposium.” (R. Jarvis and A. Zelinsky, Eds.), Springer, 2003. 35. G. Metta, Ph.D. Thesis, 2000. 36. G. Metta, G. Sandini, and J. Konczak, Neural Networks 12, 1413 (1999). 37. F. Panerai, G. Metta, and G. Sandini, Robot Auton. Syst. 30, 195 (2000). 38. L. Natale, G. Metta, and G. Sandini, Robot Auton. Syst. 39, 87 (2002). 16 39. R. Manzotti, A. Gasteratos, G. Metta, and G. Sandini, Comput. Vis. Image Und. 83, 97 (2001). 40. C. Capurro, F. Panerai, and G. Sandini, Int. J. Comput. Vision 24, 79 (1997). 41. J. Koenderink and J. Van Doorn, J. Optical Soc. Am. 8, 377 (1991). Anthropomorphic Visual Sensors 42. H. Tunley and D. Young, ECCV (1994). 43. F. Ferrari, J. Nielsen, P. Questa, and G. Sandini, Sensor Review 15, 17 (1995). 44. D. Comaniciu, F. Berton, and V. Ramesh, Real Time Imaging 8, 427 (2002). 45. C. F. R. Weiman, Proc. SPIE 1295, 266 (1990).