Academia.eduAcademia.edu

A Neuromorphic Sound Localizer for a Smart MEMS System

2000, Analog Integrated Circuits and Signal Processing

In this paper we present an analog circuit that determines the direction of incoming sound using two microphones. The circuit is inspired by biology and uses two silicon cochlea to determine the azimuthal angle of the sound source with respect to the axis of the two microphones using the time difference between the two microphone signals. A new algorithm, adapted to an analog VLSI implementation, is presented together with simulation and measurement results.

Analog Integrated Circuits and Signal Processing, 39, 267–273, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.  A Neuromorphic Sound Localizer for a Smart MEMS System ANDRÉ VAN SCHAIK1 AND SHIHAB SHAMMA2 1 School of Electrical and Information Engineering, University of Sydney, Sydney, NSW 2006, Australia 2 Institute for Systems Research, University of Maryland, College Park, MD 20742, USA E-mail: [email protected] Received November 13, 2002; Revised July 28, 2003; Accepted September 25, 2003 Abstract. In this paper we present an analog circuit that determines the direction of incoming sound using two microphones. The circuit is inspired by biology and uses two silicon cochlea to determine the azimuthal angle of the sound source with respect to the axis of the two microphones using the time difference between the two microphone signals. A new algorithm, adapted to an analog VLSI implementation, is presented together with simulation and measurement results. Key Words: sound localization, neuromorphic, analog VLSI, silicon cochlea, zero-crossing detector 1. Introduction Air-coupled acoustic MEMS offer exciting opportunities for a wide range of applications for robust sound detection, analysis, and recognition in noisy environments. The most important advance these sensors offer is the potential for fabricating and utilizing miniature, low-power, and intelligent sensor elements and arrays. In particular, MEMS make it possible for the first time to conceive of applications which employ arrays of interacting micro-sensors, creating in effect spatially distributed sensory fields. To achieve this potential, however, it is essential that these sensors be coupled to signal conditioning and processing circuitry that can tolerate their inherent noise and environmental sensitivity, without sacrificing the unique advantages of compactness and efficiency. The authors, together with several colleagues, are currently focusing their efforts on developing a smart microphone, suitable for outdoor acoustic surveillance on robotic vehicles. This smart microphone will incorporate MEMS sensors for acoustic sensing and adaptive noise-reduction circuitry. These intelligent and noise robust interface capabilities will enable a new class of small, effective air-coupled surveillance sensors, which will be small enough to be mounted on future robots and will consume less power than current systems. By including silicon cochlea based detection, classification, and localization processing, these sensors can perform end-to-end acoustic surveillance. The resulting smart microphone technology will be very power efficient, enabling a networked array of autonomous sensors that can be deployed in the field. We envision such a sensory processing system to be fully integrated with sophisticated capabilities beyond the passive sound reception of typical microphones. Smart MEMS sensors may possess a wide range of intelligent capabilities depending on the specific application, e.g., they may simply extract and transmit elementary acoustic features (sound loudness, pitch, or location), or learn and perform high-level decisions and recognition. To achieve these goals, we aim to develop and utilize novel technologies that can perform these functions robustly, inexpensively, and at extremely low power. An equally important issue is the formulation of algorithms that are intrinsically matched to the characteristic strengths and weaknesses of this technology. In this paper we present an implementation of one such algorithm, which is inspired by biology, but adapted to the strengths and weaknesses of analog VLSI, for localizing sounds in the horizontal plane using two MEMS microphones. 2. The Algorithm Humans rely heavily on the Interaural Time Difference (ITD) for localization of sounds in the horizontal plane. 268 van Schaik and Shamma When a sound source is in-line with the axis through both ears, sound will reach the furthest ear with a certain delay after reaching the closest ear. To a first approximation, ignoring diffraction effects around the head, this time delay is equal to the distance between the ears divided by the speed of sound. On the other hand, if the sound source is straight ahead or behind the listener, it will reach both ears at the same time. In between, the ITD varies as the sine of the angle of incidence of the sound. The most common method for determining the time difference between two signals in engineering is to look for the delay at which there is a peak in the crosscorrelation of the two signals. In biology, a similar strategy is used, known as Jeffress’ coincidence model [1]. In the ear sound captured by the ear-drum is transmitted to the cochlea via the middle-ear bones. The fluid-filled cochlea is divided into two parts with a flexible membrane, the basilar membrane, which has mechanical properties such that high-frequency sound makes the start of the membrane vibrate most, whereas low-frequency sound vibrates the end most. Inner Hair Cells on the basilar membrane transduce this vibration into a neural signal. For frequencies below 1–2 kHz, the spikes generated on the auditory nerve are phaselocked to the vibration of the basilar membrane and therefore to the input signal. In Jeffress’ model, this phase-locking is used together with neural delay-lines to extract the interaural time difference in each frequency band. Delayed spikes from one ear are compared with the spikes from the other ear and coincidences are detected. The position along the delay-line where the spikes coincide is a measure of the ITD. A hardware implementation of this model has been developed by Lazarro [2]. Such a hardware implementation needs the creation of delay lines with a maximum delay value of the maximum time difference expected and a minimum delay value equal to the resolution needed. This will have to be done at each cochlear output, which makes the model rather large. An alternative approach uses the fact that a silicon cochlea itself not only functions as a cascade of filters, but also as a delay line, since each filter adds a certain delay. Cross-correlation of the output of two cochleae, one for each ear, will thus give us information about the ITD [3]. However, the delays are proportional to the inverse of the cut-off frequency of the filters and are therefore scaled exponentially. This makes obtaining an actual ITD estimate from a silicon implementation of this algorithm rather tricky [4]. Fig. 1. Signals in the sound localizer algorithm. Instead of the algorithms discussed above, we have developed an algorithm that is adapted for aVLSI implementation. The algorithm is illustrated in Fig. 1. First, the left and right signals are digitized by detecting if they are above or below zero. Next, the delay between the positive zero-crossing in both signals is detected and a pulse is created with a width equal to this delay. Finally, a known constant current is integrated on a capacitor for the duration of the pulse, so that the change in voltage is equal to the pulse width. A voltage proportional to the average pulse width can be obtained by integrating over a fixed number of pulses. In our implementation, separate pulses are created for a leftleading signal and for a right-leading signal. The leftleading pulses increase the capacitor voltage, whereas the right-leading pulses decrease the capacitor voltage. Once a fixed number of pulses has been counted, the capacitor voltage is read and reset to its initial value. This algorithm was simulated in Matlab using sound files that were recorded in the field with the MEMS microphones that we intend to use in the final system. The sounds to localize can best be described as lowfrequency noise, with most of the energy between 50 and 300 Hz. Furthermore, as the microphones are only a few centimeters apart, the ITD for a sound played from 1 degree of angle in front is about 4 µs, which is much less than 0.1% of phase shift for the lowest frequency components. The results of these tests are shown in Fig. 2. It can be seen that the average value of the ITD estimate (thick line) corresponds well with the theoretical curve (dashed line), but that the standard deviation of the responses is rather high. A Neuromorphic Sound Localizer for a Smart MEMS System 269 Fig. 2. Results of simulation of the algorithm. The thick line shows the average value of the time difference estimate. Each estimate is the average value for one second. As the sound files are 30 s long, a total of 30 estimates is obtained. The error bars indicate one standard deviation and the dashed line indicates a theoretical fit proportional to sin(angle). In reality the algorithm is not applied directly to the input signal, but pair-wise to the output of two cochlear models containing 32 sections, i.e., a given frequency band from the one cochlea is compared with the same frequency band of the other cochlea. Each cochlear section has a band-pass filter characteristic and the bestfrequencies of the 32 filters are scaled exponentially between 300 and 60 Hz. Band-pass filtering increases the periodicity of signals that the algorithm is applied to, which improves its performance. The results of the Matlab simulation of the algorithm including Lyon’s cochlear model [5] is shown in Fig. 3. Using the cochlea and averaging the ITD estimate over all sections reduces the standard deviation of the response somewhat, but at the same time, the mean is slightly less close to the ideal curve. A problem with both versions of the Matlab simulation is the very fine time resolution needed in order to extract delays in the order of microseconds. 3. The Implementation The hardware implementation of our algorithm uses two identical silicon cochleae with 32 sections each. At each of the 32 sections the output of both cochleae is used to create digital pulses that are as wide as the time delay between the two signals. This time delay is measured within each section and averaged over all active sections in order to obtain a global estimate. Inactive sections are sections that do not contain enough signal during the period over which the ITD is estimated and are therefore not included in the global estimate. The total size of this implementation is 5 mm2 in a 0.5 µm process, with 75% of the circuit area devoted to the implementation of the capacitors. If the circuit were to operate at sound frequencies that humans use for ITD detection, the capacitor sizes could easily be reduced by a factor 3, cutting the total circuit size in half. 3.1. The Silicon Cochlea The silicon cochlea used is similar to the one we have presented in [6], which has already proven its use in other neuromorphic sound processing systems [7, 8]. The basic building block for the filters in this cochlear model is the transconductance amplifier, operated in weak inversion. For input voltages smaller than about 60 mVpp , the amplifier can be approximated as a linear transconductance: Iout = gm (Vin+ − Vin− ) (1) 270 van Schaik and Shamma Fig. 3. Same as Fig. 2 but using two silicon cochleae with 32 sections each. The average is obtained by averaging over one second and over all 32 sections. with transconductance gm given by: gm = I0 2nUT (2) where I0 is the bias current, n is the slope factor, and the thermal voltage UT = kT /q = 25.6 mV at room temperature. It has been shown that if all three amplifiers in the circuit are identical, the second-order section may be stable for small signals, but will exhibit large signal instability due to slew-rate limitations [9]. This can be solved by using a transconductance amplifier with a wider linear input range in the forward path [9]. This also allows larger input signals to be used, up to about 140 mVpp . Our silicon cochlea is implemented by cascading 32 of these second-order sections with exponentially decreasing cut-off frequencies. The exponential decrease is obtained by creating the bias currents of the secondorder section with CMOS Compatible Lateral Bipolar Transistors, as proposed in [6]. 3.2. The ITD Detection The circuit of Fig. 4 implements a second order lowpass filter, where the output voltage, VC2 , of each Fig. 4. The second order section used in the cochleae. section is a low-pass filtered version of the section’s input signal and serves as the input for the next section. A normalized band-pass output is obtained by subtracting VC2 from VC1 . This signal is not created explicitly on chip. Instead, the signal is digitized using a comparator to detect when VC1 > VC2 . The same is done for the second cochlea. The implementation of the zero-crossing detector (the comparator) is shown in Fig. 5. The zero-crossing detector consists of a differential pair, which turns the difference between VC1 and VC2 into a current, Icomp , and a series of three current limited inverters that create a digital voltage representing the A Neuromorphic Sound Localizer for a Smart MEMS System Fig. 5. 271 The zero-crossing detection circuit. Fig. 6. The zero-crossing detection control circuit. sign of Icomp . The first inverter in this series has negative feedback using a diode-connected NMOS and PMOS transistor. This negative feedback serves to keep the voltage at node Icomp constant at the inverter threshold, while changing the output, V1 , up or down by about one MOS transistor threshold voltage depending on the sign of Icomp . The following two inverters then amplify this change into proper digital voltage levels. The middle inverter in the series has a control signal, enableZCD, that allows switching this inverter off and drawing the voltage V2 high, which in turn forces the output, ZCD, low. The current limiting of the inverters, especially the first two in the series, is needed to limit power-consumption as their input levels are not proper digital voltage levels. Finally, V p is a cascode voltage for the PMOS current mirror, but as only the output branches are cascoded, it also offers limited control of the comparator offset. A set-reset latch, shown in Fig. 6, generates the signal enableZCD that controls the zero-crossing detector. This latch will first need to be set to allow the comparison. This is done when Idiff , a copy of the current in the left branch in the differential pair of the comparator (see Fig. 5), is larger than a certain threshold current Ith in both left and right sections at the same time, where Ith is a current controlled by Vthres . This ensures that zero-crossings are only detected on signals for which VC1 − VC2 is well below zero in both sections just before the zero-crossing, which prevents the generation of numerous zero-crossings that would be created by a small, noisy signal around zero and improves the performance of the algorithm. If the comparators are enabled and VC1 becomes larger than VC2 in the “left ear” cochlea, we detect a zero-crossing in left ear leading channel, which we will call ZCDL = 1. The same is done for the right ear to detect ZCDR = 1. However, one of these two will occur first, depending on which ear is leading. Furthermore, when both ZCDR and ZCDL are high, the latch that enables the comparators is reset (Fig. 6) which in turn forces both ZCDL and ZCDR low (Fig. 5). In reality the feedback is so fast that whichever happens second—ZCDL = 1 or ZCDR = 1—never really goes high. Therefore, the zero-crossing is only detected in the leading channel and is only high for the duration of the time difference between the zero-crossings in the two channels. Finally, the number of pulses created is counted, while for the duration of each pulse a fixed reference current is integrated onto a unit capacitor at each section. ZCDL pulses source the reference current onto the capacitor, whereas ZCDR pulses sink the current from the capacitor. When 64 pulses are counted in a particular section, the capacitor in that section is connected to a common bus and pulse generation in 272 van Schaik and Shamma Fig. 7. Measured digital (8 bit) representation of the average capacitor voltage after 64 pulses. that section is disabled—by setting finished high in Fig. 6—until a reset signal occurs, which resets the counter. As the capacitors in all sections that reach 64 pulses are connected to the bus, the total charge in all the active sections is shared across all unit capacitors in those sections, leading to an averaging of the capacitor voltage. This averaged voltage will be an estimate of the ITD of the input sound. After this value is read out, a reset signal is given and a new estimate begins. 4. Measurement Results The silicon implementation has been tested using the same recorded sounds played using a PC soundcard. Each second of sound an estimate was obtained and a reset signal given. The estimates were recorded using an off-chip A/D converter interfaced with the PC. The mean and standard deviation of the 30 estimates per sound were calculated using Matlab and are plotted in Fig. 7. The aVLSI sound localizer actually has less standard deviation of the ITD estimate than the Matlab simulation. This is a result of the noise suppression—the detection of gm (VC2 − VC1 ) < Ith in both channels—that was implemented on-chip, but not used in the Matlab simulation, because of the excessive simulation time that this would result in. For the extreme values of the ITD, around an angle of 90 degrees, the circuit systematically underestimates the delay. This is due to saturation of the integrator capacitor voltage in some of the sections. The power consumption of the circuit depends on the ITD measured, as more current is integrated onto the capacitors for larger ITDs. For an ITD generated using a 200 Hz square wave delayed by 100 µs in one of the cochleae with respect to the other, the power consumption of the circuit was 370 µA at a 5 V supply voltage, i.e., less than 2 mW. When Ith was increased to disable all pulse generation so that the power consumption is mainly due to the cochlear filters, the power consumption dropped to 400 µW. 5. Conclusions We have successfully realized a low-power aVLSI sound localizer that is capable of estimating the direction of low-frequency sounds (50–300 Hz) captured by two MEMS microphones. The average standard deviation of the estimate is 2.5% of the total range, i.e., 2.5% of 180 degrees of angle, except for the most lateral positions. The final system will use a cross of four microphones and two of the sound localizer chips in order to be able to distinguish the front direction from A Neuromorphic Sound Localizer for a Smart MEMS System 273 the back and to improve the accuracy at the more lateral positions. Acknowledgments The work in this paper has been supported by DARPA through the “Air-coupled acoustic microsensor technology” program (BAA 00-08). References 1. L.A. Jeffress, “A place theory of sound localization.” Journal of Comparative and Physiological Psychology, vol. 41, pp. 35–39, 1948. 2. J. Lazzaro and C.A. Mead, “A silicon model of auditory localization.” Neural Computation, vol. 1, pp. 47–57, 1989. 3. S.A. Shamma, S. Naiming, and P. Gopalaswamy, “Stereausis: Binaural processing without neural delays.” Journal of the Acoustical Society of America, vol. 86, pp. 989–1006, 1989. 4. C.A. Mead, X. Arreguit, and J. Lazzaro, “Analog VLSI model of binaural hearing.” IEEE Transactions on Neural Networks, vol. 2, pp. 230–236, 1991. 5. M. Slaney, “Auditory toolbox: Version 2.” Interval Research Corporation 1998-010, 1998. 6. A. van Schaik, E. Fragnière, and E. Vittoz, “Improved silicon cochlea using compatible lateral bipolar transistors.” Presented at Advances in Neural Information Processing Systems, Cambridge MA, 1996, vol. 8. 7. A. van Schaik, “An analog VLSI model of periodicity extraction in the human auditory system.” Analog Integrated Circuits & Signal Processing, vol. 26, pp. 157–177, 2001. 8. A. van Schaik and R. Meddis, “Analog very large-scale integrated (VLSI) implementation of a model of amplitude-modulation sensitivity in the auditory brainstem.” Journal of the Acoustical Society of America, vol. 105, pp. 811–821, 1999. 9. L. Watts, D.A. Kerns, R.F. Lyon, and C.A. Mead, “Improved implementation of the silicon cochlea.” IEEE Journal of SolidState Circuits, vol. 27, pp. 692–700, 1992. André van Schaik obtained his M.Sc. in electronics from the University of Twente in 1990. In 1991– 1994, he worked at CSEM, Neuchâtel, Switzerland, in the Advanced Research group of prof. Eric Vittoz. In this period he designed several analogue VLSI chips for perceptive tasks, some of which have been industrialised. A good example of such a chip is the artificial, motion detecting, retina in Logitech’s Trackman Marble TM. From 1994 until 1998, he was a research assistant and Ph.D. student with prof. Vittoz at the Swiss Federal Institute of Technology in Lausanne (EPFL). Subject of his Ph.D. research was the development of analogue VLSI models of the auditory pathway. In 1998 he was a post-doctorate research fellow at the Auditory Neuroscience Laboratory of Dr. Simon Carlile at the University of Sydney. In April 1999 he became a Senior Lecturer in Computer Engineering for the School of Electrical & Information Engineering at the University of Sydney. He is now a Reader in the same School and Head of the Computing and Audio Research Laboratory. His research interests include analogue VLSI, neuromorphic systems, human sound localisation, and virtual reality audio systems.