EURASIP Journal on Applied Signal Processing 2003:7, 629–638
c 2003 Hindawi Publishing Corporation
An FPGA-Based Electronic Cochlea
M. P. Leong
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
Email:
[email protected]
Craig T. Jin
Department of Electrical and Information Engineering, The University of Sydney, Sydney, NSW 2006, Australia
Email:
[email protected]
Philip H. W. Leong
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
Email:
[email protected]
Received 18 June 2002 and in revised form 1 August 2002
A module generator which can produce an FPGA-based implementation of an electronic cochlea filter with arbitrary precision is
presented. Although hardware implementations of electronic cochlea models have traditionally used analog VLSI as the implementation medium due to their small area, high speed, and low power consumption, FPGA-based implementations offer shorter
design times, improved dynamic range, higher accuracy, and a simpler computer interface. The tool presented takes filter coefficients as input and produces a synthesizable VHDL description of an application-optimized design as output. Furthermore, the
tool can use simulation test vectors in order to determine the appropriate scaling of the fixed-point precision parameters for each
filter. The resulting model can be used as an accelerator for research in audition or as the front-end for embedded auditory signal
processing systems. The application of this module generator to a real-time cochleagram display is also presented.
Keywords and phrases: field programmable gate array, electronic cochlea, VHDL modules.
1.
The field of neuromorphic engineering has the long-term
objective of taking architectures from our understanding of
biological systems to develop novel signal processing systems.
This field of research, pioneered by Mead [1], has concentrated on using analog VLSI to model biological systems. Research in this field has led to many biologically inspired signal
processing systems which have improved performance compared to traditional systems.
The human cochlea is a transducer which converts mechanical vibrations from the middle ear into neural electrical discharges, and additionally provides spatial separation of
frequency information in a manner similar to that of a spectrum analyzer [2]. It serves as the front-end signal processing
for all functions of the auditory nervous system such as auditory localization, pitch detection, and speech recognition.
Although it is possible to simulate cochlea models in software, hardware implementations may have orders of magnitude of improvement in performance. Hardware implementations are also attractive when the target applications are
on embedded devices in which power efficiency and smallfootprint are design considerations.
Low-frequency
sections
High-frequency
sections
INTRODUCTION
Samples
IIR
biquadratic
section
IIR
biquadratic
section
Output 1
Output 2
IIR
biquadratic
section
Output N
Figure 1: Cascaded IIR biquadratic section used in the Lyon and
Mead cochlea model.
The electronic cochlea, first proposed by Lyon and Mead
[2], is a cascade of biquadratic filter sections (as shown in
Figure 1) which mimics the qualitative behavior of the human cochlea. Electronic cochlea have been successfully used
in auditory signal processing systems such as spatial localization [3], pitch detection [4], a computer peripheral [5], amplitude modulation detection [6], correlation [7], and speech
recognition [8].
There have been several previous implementations of
electronic cochlea in analog VLSI technology. The original
630
implementation by Lyon and Mead was published in 1988
and used continuous time subthreshold transconductance
circuits to implement a cascade of 480 stages [2, 9]. In
1992, Watts et al. reported a 50-stage version with improved
dynamic range, stability, matching, and compactness [10].
A problem with analog implementations is that transistor
matching issues affect the stability, accuracy, and size of
the filters. This issue was addressed by van Schaik et al. in
1997 using compatible lateral bipolar transistors instead of
MOSFETs in parts of the circuit [11]. Their 104-stage test
chip showed greatly improved characteristics. In addition, a
switched capacitor cochlea filter was proposed by Bor et al.
in 1996 [12].
There have also been several previously reported digital VLSI cochlea implementations. In 1992, Summerfield
and Lyon reported an application-specific integrated circuit
(ASIC) implementation which employed bit-serial secondorder filters [13]. In 1997, Lim et al. reported a VHDLbased pitch detection system which used first-order Butterworth band pass filters for cochlea filtering [14]. Later
in 1998, Brucke et al. designed a VLSI implementation of
a speech preprocessor which used gammatone filter banks
to mimic the cochlea [15]. The implementation by Brucke
et al. used fixed-point arithmetic and they also explored
trade-offs between wordlength and precision. In 2000, Watts
built a 240-tap high-resolution implementation of a cochlea
model using FPGA technology (http://www.lloydwatts.com/
neuroscience.shtml) and in 2002 a tenth-order recursive
cochlea filter was implemented using FPGA technology [16].
A field programmable gate array (FPGA) is an array of
logic gates in which the connections can be configured by
downloading a bitstream into its memory. Traditional ASIC
design requires weeks or months for the fabrication process,
whereas an FPGA can be configured in milliseconds. An additional advantage of FPGA technology is that the same devices can be reconfigured to perform different functions. At
the time of writing this paper in 2002, FPGAs had equivalent
densities of ten million system gates.
Since most systems which employ an electronic cochlea
are experimental in nature, the long design and fabrication
times associated with both analog and digital VLSI technology are a major shortcoming. Recently, FPGA technology has
improved in density to the point where it is possible to develop large scale neuromorphic systems on a single FPGA.
Although these are admittedly larger in area, have higher
power consumption, and may have lower throughput than
the more customized analog VLSI implementations, many
interesting neuromorphic signal processing systems can be
implemented using FPGA technology, enjoying the following advantages over analog and digital VLSI:
(i) shorter design and fabrication time;
(ii) more robust to power supply, temperature, and transistor mismatch variations than analog systems;
(iii) arbitrarily high dynamic range and signal-to-noise ratios can be achieved over analog systems;
(iv) whereas a VLSI design is usually tailored for a single
application, the reconfigurability and reuseability of an
EURASIP Journal on Applied Signal Processing
FPGA enables the same system to be used for many
applications;
(v) designs can be optimized for each specific instance of a
problem whereas ASICs need to be more general purpose;
(vi) they can be interfaced more easily with a host computer.
The main difficulty that one faces in implementing an
electronic cochlea on an FPGA is the choice of arithmetic
system to be used in the imple mentation of the underlying
filters. In the module generator which will be presented, a
fixed-point implementation strategy was chosen over floating point since we believed it would result in an implementation with smaller area. Distributed arithmetic (DA)
was used to implement the multipliers associated with the
filters in an efficient manner. Finally, a module generator
which can generate synthesizable VHDL descriptions of arbitrary wordlength fixed-point cochlea filters was developed.
The module generator can also be used, together with our
fp simulation tool [17, 18], to determine the minimum and
maximum ranges of all variables. This range information is
then used to determine the maximal number of fractional
bits which can be used in the variable’s two’s complement
fraction representation, hence minimizing quantization
error.
The FPGA implementation of the electronic cochlea described here can serve as a computational accelerator in its
own right, or be used as a front-end preprocessing stage for
embedded auditory applications. As a sample application, a
real-time cochleagram display is presented.
The rest of the paper is organized as follows. In Section 2,
Lyon and Mead’s cochlea model is described. Section 3 describes the implementation of the filter stages using DA. Our
design methodology is presented in Section 4, followed by
results in Section 5. Conclusions are drawn in Section 6.
2.
LYON AND MEAD’S COCHLEA MODEL
Lyon and Mead proposed the first electronic cochlea in 1988
[2, 19]. This model captured the qualitative behavior of the
human cochlea using a simple cascade of second order filter stages which they implemented in analog VLSI. In this
section, a very superficial summary of the Lyon and Mead
cochlea model is given. More detailed descriptions of the
cochlea can be found in [2, 20].
The human cochlea, or inner ear, is a three-dimensional
fluid dynamic system which converts mechanical vibrations
from the middle ear into neural electrical discharges [2]. It
is composed of the basilar membrane, inner hair cells, and
outer hair cells. The cochlea connects to higher levels in the
auditory pathway for further processing.
The basilar membrane is a longitudinal membrane within the cochlea. The oval window provides the input to the
cochlea. Vibrations of the eardrum are coupled via bones in
the middle ear to the oval window causing a traveling wave
from base to apex along the basilar membrane. The basilar
membrane has a filtering action and can be thought of as
An FPGA-Based Electronic Cochlea
631
filters to have exponentially decreasing cutoff frequencies.
The Q of all the filters is held constant. The output of each
filter corresponds to the displacement at different positions
along the basilar membrane.
Apex
3.
IIR FILTERS USING DA
3.1.
Base
Basilar membrane
Oval window
Figure 2: Illustration of a sine wave travelling through a simplified
box model of an uncoiled cochlea (adapted from [2]).
Distributed arithmetic
DA offers an efficient method to implement a sum of products (SOP) provided that one of the variables does not
change during execution. Instead of requiring a multiplier,
DA utilizes a precomputed lookup table [21, 22].
Consider the SOP, S of N terms
S=
b0
X(n)
z−1
b1
z−1
where ki is the (fixed) weighting factor and xi is the input.
For two’s complement fractions, the numerical value of xi =
{xi0 xi1 · · · xi(n−1) } is
z−1
Y (n − 1)
X(n − 1)
b2
−a2
z−1
xi = −xi0 +
Y (n − 2)
X(n − 2)
(2)
ki xi ,
i =0
Y (n)
−a1
N
−1
n
−1
xib × 2−b .
(3)
b =1
Figure 3: The architecture of an IIR biquadratic section.
Substituting (3) into (2) yields
S = − x00 × k0 + x10 × k1 + · · · + x(N −1)0 × kN −1 × 20
a cascade of lowpass filters with exponentially decreasing cutoff frequency from base to apex.
The result of the filtering of the basilar membrane at
any point along its length is a bandpass filtered version of
the input signal, with center frequency decreasing along its
length. Different distances along the basilar membrane are
tuned to specific frequencies in a manner similar to that of a
spectrum analyzer. A simplified box model showing a sinusoidal wave traveling along an uncoiled cochlea is shown in
Figure 2.
Several thousand inner hair cells are distributed along the
basilar membrane and convert the displacement of the basilar membrane to a neural signal. The hair cells also perform a
half-wave rectifying function since only displacements in one
direction will cause neurons to fire.
The outer hair cells perform automatic gain control by
changing the damping of the basilar membrane. It is interesting to note that there are approximately three times more
outer hair cells than inner hair cells.
In order to simulate the properties of the basilar membrane, Lyon and Mead’s cochlea model used a cascade of
scaled second-order lowpass filters with the transfer function
H(s) =
1
,
τ 2 s2 + (1/Q)τs + 1
+ x01 × k0 + x11 × k1 + · · · + x(N −1)1 × kN −1 × 2−1
where Q represents the damping characteristic (or quality) of
the filter and τ the time constant. In the cochlea filter, the τ of
each filter is varied exponentially along the cascade, causing
+ x02 × k0 + x12 × k1 + · · · + x(N −1)2 × kN −1 × 2−2
..
.
+ x0(n−1) × k0 + x1(n−1) × k1 + · · · + x(N −1)(n−1) × kN −1
× 2−(n−1) .
(4)
The organization of the input variables is in a bit-serial
least significant bit (LSB) first format. Since xi j ∈ {0, 1} (i =
0, 1, . . . , N −1, j = 0, 1, . . . , n−1), each term within the brackets of (4) is the sum of weighting factors k0 , k1 , . . . , kN −1 . On
every clock cycle, one of the bracketed terms of S can thus be
computed by applying x0 , x1 , . . . , xN −1 as the address inputs
of a 2(N −1) entry read-only memory (ROM). The contents
of the ROM are precomputed from the constant ki ’s and are
shown in Table 1. The output of the ROM is multiplied by a
power of two (a shift operation) and then accumulated. After
n cycles, the accumulator contains the value of S.
3.2.
(1)
Digital IIR filters
A general IIR second-order filter has a transfer function of
the form
H(z) =
b0 + b1 z−1 + b2 z−2
.
1 + a1 z−1 + a2 z−2
(5)
632
EURASIP Journal on Applied Signal Processing
Serial input
SRL16E
SRL16E
DA LUT
ROM32X1
Scaling
accumulator
Parallel-toserial
converter
Serial output
SRL16E
Figure 4: Implementation of an IIR biquadratic section on an Xilinx Virtex FPGA.
Table 1: Contents of a DA ROM. For each address, the terms ki for
which bi = 1 are summed.
bN −1 · · · b2 b1 b0
Address
Contents
0 · · · 000
0 · · · 001
0 · · · 010
0 · · · 011
0 · · · 100
0 · · · 101
0 · · · 110
0 · · · 111
.
..
1 · · · 111
0
1
2
3
4
5
6
7
.
..
0
k0
k1
k0 + k1
k2
k0 + k2
k2 + k1
k0 + k1 + k2
.
..
k0 + k1 + · · · + kN −1
2N −1
The corresponding time domain IIR filter can be implemented by the function
y(n) = b0 x(n) + b1 x(n − 1) + b2 x(n − 2)
+ a0 y(n − 1) + a1 y(n − 2),
(6)
where x(n − k) is the kth previous input, y(n − k) is the kth
previous output, and y(n) is the output. The operation is essentially the SOP of five terms, and can be directly mapped
to a biquadratic section as shown in Figure 3.
Figure 4 illustrates our actual implementation using DA
(described in Section 3.1) on an Xilinx Virtex FPGA. The
previous values x(n − 1), x(n − 2), and y(n − 2) are implemented using shift registers with the number of stages
equal to the wordlength of the variables used. The shift registers are implemented by cascades of Virtex SRL16E primitives for minimum area. The DA ROM takes x(n), x(n −
1), x(n − 2), y(n − 1), and y(n − 2) as inputs to generate partial sums (bracketed terms in (4)). As there are
5 inputs, the required number of entries in the ROM is
25 = 32, leading to an efficient implementation using Xilinx ROM32X1 primitives. The scaling accumulator shifts and
adds the output from the ROM (unscaled partial sum in bitparallel organization) at every cycle to produce y(n). In the
last cycle of scaling and accumulation, the parallel-to-serial
converter latches the value at the scaling accumulator. Since
the scaling accumulator has a latency equal to the wordlength
of the variables, the value latched by the converter is
y(n − 1).
4.
DESIGN METHODOLOGY
Given the filter coefficients, the designer selects appropriate
values of filter wordlength and the number of bits (width)
of the DA ROM’s output. Note that all filter sections have
the same wordlength although the allocation of integer and
fractional parts used within each filter section can vary.
The cochlea filter model is written in a subset of C which
supports only expressions and assignments [17, 18]. A compiler uses standard parsing techniques to translate expressions into directed acyclic graphs (DAG). Each operator is
mapped to a module which is a software object consisting of
a set of parameters, a simulator, and a component generator.
The simulator can perform the operation at a requested precision to determine range information. It can also compare
fixed-point output with a floating-point computation to derive error statistics.
As an input, the fp cochlea generator takes the coefficients obtained from an auditory toolbox, the wordlength of
variables, and the width of the DA ROM. Although inputs
and outputs of all filter sections are of the same wordlength,
their fractional wordlength can be different (two’s complement fractions are used). The dynamic ranges of inputs and
outputs are determined by fp through simulation of a set of
user-supplied test vectors. The generator performs simulation using the test vectors as inputs and the range of each
variable can be determined. From this information, the minimum number of bits needed for the integer part of each variable is known and, since the wordlength is fixed, the maximum number of bits can be assigned to the fractional part of
the variable.
After deducing the best representation for each variable, the generator outputs a synthesizable VHDL code that
describes an implementation of the corresponding cochlea
model. The fractional wordlengths of the scaling accumulator and the output variable can be different, so the operator
104
103
Frequency (Hz)
104
20
10
0
−10
−20
−30
−40
−50
−60
102
20
10
0
−10
−20
−30
−40
−50
−60
102
103
Frequency (Hz)
104
20
10
0
−10
−20
−30
−40
−50
−60
102
103
Frequency (Hz)
(j) (32-bit, 12-bit).
Gain (dB)
103
Frequency (Hz)
104
20
10
0
−10
−20
−30
−40
−50
−60
102
103
Frequency (Hz)
(k) (32-bit, 16-bit).
104
103
Frequency (Hz)
104
(f) (16-bit, 24-bit).
104
20
10
0
−10
−20
−30
−40
−50
−60
102
(h) (24-bit, 16-bit).
Gain (dB)
Gain (dB)
(g) (24-bit, 12-bit).
20
10
0
−10
−20
−30
−40
−50
−60
102
104
20
10
0
−10
−20
−30
−40
−50
−60
102
(e) (16-bit, 16-bit).
Gain (dB)
Gain (dB)
(d) (16-bit, 12-bit).
103
Frequency (Hz)
103
Frequency (Hz)
(c) (12-bit, 24-bit).
Gain (dB)
20
10
0
−10
−20
−30
−40
−50
−60
102
104
20
10
0
−10
−20
−30
−40
−50
−60
102
(b) (12-bit, 16-bit).
Gain (dB)
Gain (dB)
(a) (12-bit, 12-bit).
103
Frequency (Hz)
Gain (dB)
103
Frequency (Hz)
20
10
0
−10
−20
−30
−40
−50
−60
102
103
Frequency (Hz)
104
(i) (24-bit, 24-bit).
Gain (dB)
20
10
0
−10
−20
−30
−40
−50
−60
102
633
Gain (dB)
Gain (dB)
An FPGA-Based Electronic Cochlea
104
20
10
0
−10
−20
−30
−40
−50
−60
102
103
Frequency (Hz)
104
(l) (32-bit, 24-bit).
Figure 5: Frequency responses of cochlea implementations with different wordlength and width of ROMs (wordlength, ROM width).
must also include a mechanism to convert the former to the
latter. Since the output of the scaling accumulator is bitparallel while the output variable is bit-serial, the parallel-toserial converter can perform format scaling by selecting the
appropriate bits to serialize. The resulting VHDL description
can then be used as a core in other designs.
The high level cochlea model description is approximately 60 lines of C code. From that, it generates approximately 50000 lines of VHDL code for the case of a cochlea
filter with 88 biquadratic sections.
0.5
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−0.3
−0.4
−0.5
Amplitude
EURASIP Journal on Applied Signal Processing
Amplitude
634
0 10 20 30 40 50 60 70 80 90
Time
0.5
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−0.3
−0.4
−0.5
20
10
0
−10
−20
−30
−40
−50
−60
102
(b) Impulse response (hardware).
Gain (dB)
Gain (dB)
(a) Impulse response (software).
103
Frequency (Hz)
0 10 20 30 40 50 60 70 80 90
Time
104
(c) Frequency response (software).
20
10
0
−10
−20
−30
−40
−50
−60
102
103
Frequency (Hz)
104
(d) Frequency response (hardware).
Quantization error (dB)
Figure 6: Impulse response of (a) the software floating-point implementation and (b) hardware 16-bit wordlength, 16-bit ROM width
implementation. Frequency response of (c) the software floating-point implementation and (d) hardware 16-bit wordlength, 16-bit ROM
width implementation.
tation were performed using Synopsys FPGA Express 3.4 and
Xilinx Foundation 3.2i, respectively.
40
20
5.1.
0
−20
−40
−60
−80
−100
10
12
14
Wi
dth 16 18
of
LU 20 22
T
12 10
16 14
20 18
22
26 24
24 32 30 28
n gt h
Wordle
Figure 7: Mesh plot showing the quantization errors of implementations with varying wordlengths and DA ROM widths.
5.
RESULTS
The cochlea implementation was tested on an Annapolis
“Wildstar” Reconfigurable Computing Engine [23] which
is a PCI-based reconfigurable computing platform containing three Xilinx Virtex XCV1000-BG560-6 FPGAs. The
cochlea implementations were verified by comparing Synopsys VHDL Simulator simulations with the results produced
by a floating-point software model. Synthesis and implemen-
Trade-offs among wordlength, width of DA ROM,
and precision
The coefficients for the biquadratic filters in our implementation of Lyon and Mead’s cochlea model were obtained using Slaney’s Auditory Toolbox [24]. This Matlab toolbox has
several different cochlea models, test inputs, and visualization tools. The same toolbox was used to verify our designs
and produce cochleagram plots.
The coefficients of these implementations were obtained
from the Auditory Toolbox using the Matlab command
DesignLyonFilters(16000, 8, 0.25), which specifies
a 16 kHz sampling rate, Q = 8, and a spacing which gives 88
biquadratic filters. A series of cochlea implementations, with
wordlengths from 10 to 32 bits and DA ROM width from 10
to 24 bits, was generated in order to present the trade-offs
among wordlengths, widths of DA ROMs, and precisions.
In order to present the improvement in precision with
increasing wordlengths and ROM width, the frequency responses of several different fixed-point implementations are
plotted in Figure 5. Figure 6 shows impulse and frequency responses obtained from a software floating-point implementation, a hardware 16-bit wordlength, and 16-bit ROM width
implementation.
It can be observed that the filter accuracy gradually improves with increasing wordlength or ROM width. When
An FPGA-Based Electronic Cochlea
635
Block
RAM
PCI/LAD
bus
interface
PCI bus
Parallelto-serial
converter
LAD
bus
interface
Cochlea core
Serial-toparallel
converter
Serial-toparallel
converter
Half-wave
rectifier
Half-wave
rectifier
Accumulator
Accumulator
Serial-toparallel
converter
...
Half-wave
rectifier
Accumulator
XCV1000 FPGA
“Wildstar” platform
Figure 8: System architecture of the cochleagram display.
Table 2: Area requirements of an 88-section cochlea implementation of different wordlengths and ROM width (number of slices).
Wordlength
12-bit
16-bit
20-bit
24-bit
28-bit
32-bit
12-bit
5770
6160
6914
7620
8288
9297
ROM Width
16-bit
20-bit
6582
7440
6800
7589
7343
7874
8048
8578
8748
9278
9716
10245
24-bit
8340
8515
8602
9106
9805
10771
Table 3: Maximum clock rates and corresponding sampling rates
of 88-section cochlea implementations of different wordlengths and
ROM width (maximum clock rate (MHz) and maximum sampling
rate (MHz)).
Wordlength
12-bit
16-bit
20-bit
24-bit
28-bit
32-bit
12-bit
56.42, 4.70
67.48, 4.22
64.48, 3.22
64.24, 2.68
60.22, 2.15
62.68, 1.96
ROM Width
16-bit
20-bit
62.79, 5.23 69.49, 5.79
67.54, 4.22 65.16, 4.07
63.58, 3.18 61.86, 3.09
60.98, 2.54 57.94, 2.41
57.93, 2.07 54.00, 1.93
63.11, 1.97 65.23, 2.04
24-bit
67.24, 5.60
65.02, 4.06
61.79, 3.09
59.47, 2.48
49.09, 1.75
63.09, 1.97
wordlengths or ROM widths are too small, there are significant quantization effects that may result in oscillation (as
in the 12-bit wordlength implementations) or improper frequency responses at certain frequency intervals (as in the
12-bit DA ROM implementations). With 24-bit wordlength
and 16-bit ROMs, for example, the total quantization error is −39.46 dB, which is sufficient for most speech appli-
cations. Figure 7 shows the trend of improved quantization
error with increasing wordlength and ROM width.
Area requirements, maximum clock rates, and maximum sampling rates of these implementations on a Xilinx Virtex XCV1000-6 FPGA, as reported by the Xilinx implementation tools, are shown in Tables 2 and 3. A Xilinx
XCV1000 has 12288 slices and the largest currently available
parts, XCV3200, have 32448 slices. As a bit-serial architecture was employed, the effective sampling rates of the implementations are their maximum clock rates divided by their
wordlengths.
5.2.
Application to a cochleagram display
A 24-bit wordlength, 16-bit DA ROM implementation was
used to construct a cochleagram display application. Due to
limited hardware resources on a Xilinx XCV1000-6 FPGA,
only the first 60 out of the 88 cochlea sections were used.
These cochlea sections correspond to a frequency range of
1006 to 7630 Hz.
The design of the cochleagram display is shown in
Figure 8. The host PC writes input data into a dual-port
block RAM (256 × 32-bit synchronous RAM) which passes
through a parallel-to-serial converter and enters the cochlea
core. Each of the outputs of the cochlea core undergoes
serial-to-parallel conversion followed by half-wave rectification (to model the functionality of the inner hair cells). The
outputs were integrated over 256 samples and sent to the PC
for display.
The cochleagram display was tested with several different inputs. Figure 9 shows the cochleagrams produced from
swept-sine wave and the auditory toolbox’s “tapestry” inputs;
the former is a 25-second linear chirp and the latter is the
speech file of a woman saying “a huge tapestry hung in her
hallway.”
636
EURASIP Journal on Applied Signal Processing
0
Cochlea channel
10
20
30
40
50
0
2
4
6
8
10
12
14
16
18
×105
Time sample
(a) Swept-sine wave.
0
Cochlea channel
10
20
30
40
50
0
2
4
6
8
Time sample
10
12
×106
(b) “tapestry” input.
Figure 9: Cochleagrams of (a) swept-sine wave and (b) “tapestry”
inputs. The former has 400000 samples while the latter has 50380
samples. Only the first 60 out of the 88 cochlea sections were
used because of limited hardware resources on a Xilinx XCV1000-6
FPGA. These cochlea sections correspond to a frequency range of
1006–7630 Hz.
The cochleagram display requires 10344 slices and can be
clocked at 44.15 MHz, yielding a sampling rate of 1.84 MHz
(or 115 times faster than real-time performance). Including
software and interfacing overheads, the measured throughput on the “Wildstar” platform was 238 kHz. As a comparison, the auditory toolbox achieves a 64 kHz throughput on a
Sun Ultra-5 360 MHz machine. The performance could be
further improved by using large and/or faster speed grade
FPGAs, or via improved floorplanning of the design which
would allow a higher clock frequency.
It is interesting to compare the FPGA-based cochleagram
system with a similar system developed in analog VLSI by
Lazzaro et al. in 1994 [5]. Using a 2 µm CMOS process,
they integrated a 119 stage silicon cochlea (with a slightly
more sophisticated hair cell model), nonvolatile analog storage, and a sophisticated event-based communications protocol on a single 3.6 × 6.8 mm2 chip with a power consumption of 5 mW. The analog VLSI version has improved density
and power consumption compared with the FPGA approach.
However, the FPGA version is vastly simpler, easier to modify, has a shorter design time, and is much more tolerant of
supply voltage, temperature, and transistor matching variations. Although qualitative results are not available, it is expected that the FPGA version also has better filter accuracy,
has a wider dynamic range and can operate at higher Q without instability.
We believe that there are many applications of the FPGA
cochlea including the development of more refined cochlea
or cochlea-like models. An FPGA cochlea is particularly
suited as a testbed for algorithms that involve concurrent
processing across cochlea channels such as (i) more realistic hair cell models, (ii) auditory streaming and the separation of foreground stimuli from background noise, (iii) auditory processing in reverberant environments, (iv) human
sound localization, and (v) bat echolocation. In addition,
the FPGA platform provides an avenue for developing, simulating, and studying auditory processing in more complicated, but realistic acoustic environments, that involve multiple sound sources, multiple reflection paths, and external
ear acoustic filtering that varies with sound direction. The
signal processing required to simulate such realistic environments is computationally intensive and some of this preprocessing can be incorporated into the FPGA platform enabling real-time studies of auditory processing under realistic
acoustic conditions. We are also interested in finding ways in
which FPGA cochlea models can assist in adapting or translating cochlea processing principles into engineering implementations. Future projects include using the FPGA cochlea
in comparisons of a cochlea model with an analysis-synthesis
filter bank as used in perceptual audio coding, audio visualization displays, and a neuromorphic isolated word spotting
system with cochlea preprocessing. Although modern digital signal processors (DSP) are capable of achieving similar
or even higher performance, the FPGA may have advantages
in terms of power consumption and smaller footprint. The
availability of more than 100 dedicated high-speed multipliers in newer Virtex II [25] devices would enable implementations with much higher throughput than the implementation
presented in this paper and would also free up more FPGA
logic resources for implementing hair cell and higher-level
processing models.
6.
CONCLUSION
A parameterized FPGA implementation of an electronic
cochlea that can be used as a building block for many systems
which model the human auditory pathway was developed.
This electronic cochlea demonstrates the feasibility of incorporating large neuromorphic systems on FPGA devices. Neuromorphic systems employ parallel distributed processing
An FPGA-Based Electronic Cochlea
which is well suited to FPGA implementation, and may offer significant advantages over conventional architectures.
FPGAs provide a very flexible platform for the development of experimental neuromorphic circuits and offer advantages in terms of faster design time, faster fabrication
time, wider dynamic range, better stability, and simpler computer interface over analog VLSI implementations.
637
[15]
[16]
ACKNOWLEDGMENTS
The authors would like to thank the anonymous reviewer.
The work described in this paper was supported by a direct
grant from the Chinese University of Hong Kong (Project
code 2050240), the German Academic Exchange Service, and
the Research Grants Council of Hong Kong Joint Research
Scheme (Project no. G HK010/00).
REFERENCES
[1] C. Mead, Analog VLSI and Neural Systems, Addison-Wesley,
Boston, Mass, USA, 1989.
[2] R. F. Lyon and C. Mead, “An analog electronic cochlea,” IEEE
Trans. Acoustics, Speech, and Signal Processing, vol. 36, no. 7,
pp. 1119–1134, 1988.
[3] J. P. Lazzaro and C. Mead, “Silicon models of auditory localization,” Neural Computation, vol. 1, pp. 47–57, Spring 1989.
[4] J. P. Lazzaro and C. Mead, “Silicon models of pitch perception,” Proc. National Academy of Sciences, vol. 86, no. 23, pp.
9597–9601, 1989.
[5] J. P. Lazzaro, J. Wawrzynek, and A. Kramer, “Systems technologies for silicon auditory models,” IEEE Micro, vol. 14, no.
3, pp. 7–15, 1994.
[6] A. van Schaik and R. Meddis, “Analog very large-scale integrated (VLSI) implementation of a model of amplitudemodulation sensitivity in the auditory brainstem,” Journal of
the Acoustical Society of America, vol. 105, no. 2, pp. 811–821,
1999.
[7] C. A. Mead, X. Arreguit, and J. P. Lazzaro, “Analog VLSI
model of binaural hearing,” IEEE Transactions on Neural Networks, vol. 2, no. 2, pp. 230–236, 1991.
[8] J. P. Lazzaro, J. Wawrzynek, and R. P. Lippmann, “Micro power analog circuit implementation of hidden Markov
model state decoding,” IEEE Journal Solid State Circuits, vol.
32, no. 8, pp. 1200–1209, 1997.
[9] R. F. Lyon, “Analog implementations of auditory models,” in
Proc. DARPA Workshop on Speech and Natural Language, Morgan Kaufman, Pacific Grove, Calif, USA, February 1991.
[10] L. Watts, D. A. Kerns, R. F. Lyon, and C. A. Mead, “Improved
implementation of the silicon cochlea,” IEEE Journal Solid
State Circuits, vol. 27, no. 5, pp. 692–700, 1992.
[11] A. van Schaik, E. Fragnière, and E. Vittoz, “Improved silicon
cochlea using compatible lateral bipolar transistors,” in Advances in Neural Information Processing Systems, vol. 8, MIT
press, Cambridge, Mass, USA, 1997.
[12] J.-C. Bor and C.-Y. Wu, “Analog electronic cochlea design
using multiplexing switched-capacitor circuits,” IEEE Transactions on Neural Networks, vol. 7, no. 1, pp. 155–166, 1996.
[13] C. D. Summerfield and R. F. Lyon, “ASIC implementation of
the Lyon cochlea model,” in Proc. IEEE Int. Conf. Acoustics,
Speech, Signal Processing, pp. 673–676, San Francisco, Calif,
USA, March 1992.
[14] S. C. Lim, A. R. Temple, and S. Jones, “VHDL-based design
of biologically inspired pitch detection system,” in Proc. IEEE
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
International Conference on Neural Network, vol. 2, pp. 922–
927, Houston, Tex, USA, June 1997.
M. Brucke, W. Nebel, A. Schwarz, B. Mertsching, M. Hansen,
and B. Kollmeier, “Digital VLSI-implementation of a psychoacoustically and physiologically motivated speech preprocessor,” in Proc. NATO Advanced Study Institute on Computational Hearing, pp. 157–162, Il Ciocco, Italy, 1998.
A. Mishra and A. E. Hubbard, “A cochlear filter implemented
with a field-programmable gate array,” IEEE Trans. on Circuits
and Systems II: Analog and Digital Signal Processing, vol. 49,
no. 1, pp. 54–60, 2002.
M. P. Leong, M. Y. Yeung, C. K. Yeung, C. W. Fu, P. A. Heng,
and P. H. W. Leong, “Automatic floating to fixed point translation and its application to post-rendering 3D warping,” in
Proc. 7th IEEE Symposium on Field-Programmable Custom
Computing Machines, pp. 240–248, Napa, Calif, USA, April
1999.
M. P. Leong and P. H. W. Leong, “A variable-radix digit-serial
design methodology and its applications to the discrete cosine
transform,” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 11, no. 1, pp. 90–104, 2003.
R. F. Lyon and C. Mead, “Electronic cochlea,” in Analog VLSI and Neural Systems, Addison-Wesley, Boston, Mass,
USA, 1989, Chapter 16.
J. O. Pickles, An Introduction to the Physiology of Hearing, Academic Press, London, UK, 1988.
G. R. Goslin, A Guide to Using Field Programmable Gate Arrays (FPGAs) for Application-Specific Digital Signal Processing
Performance, Xilinx, San Jose, Calif, USA, 1995, Application
Note.
Xilinx, The Role of Distributed Arithmetic in FPGA-based Signal Processing, November 1996, http://www.xilinx.com.
Annapolis Micro Systems, Wildstar Reference Manual, 2000,
Revision 3.3.
M. Slaney, “Auditory toolbox: A MATLAB Toolbox for auditory modeling work,” Tech. Rep. 1998-010, Interval Research
Corporation, Palo Alto, Calif, USA, 1998, Version 2.
Xilinx, Virtex-II Platform FPGA User Guide, 2001, Version
1.1.
M. P. Leong received his B.E. and Ph.D. degrees from The Chinese University of Hong
Kong in 1998 and 2001, respectively. He is
currently the System Manager of the Center for Large-Scale Computation at the same
University. His research interests include
network security, parallel computing, and
field-programmable systems.
Craig T. Jin received his B.S. degree
(1989) from Stanford University, M.S. degree (1991) from Caltech, and Ph.D. degree
(2001) from the University of Sydney. He is
currently a Lecturer at the School of Electrical and Information Engineering at the University of Sydney. Together with André van
Schaik, he heads the Computing and Augmented Reality Laboratory. He is the author
of over thirty technical papers and three
patents. His research interests include multimedia signal processing, 3D audio engineering, programmable analogue VLSI filters,
and reconfigurable computing.
638
Philip H. W. Leong received the B.S., B.E.,
and Ph.D. degrees from the University of
Sydney in 1986, 1988, and 1993, respectively. In 1989, he was a Research Engineer
at AWA Research Laboratory, Sydney, Australia. From 1990 to 1993, he was a postgraduate student and Research Assistant at
the University of Sydney, where he worked
on low-power analog VLSI circuits for arrhythmia classification. In 1993, he was a
Consultant of SGS Thomson Microelectronics in Milan, Italy. He
was a Lecturer at the Department of Electrical Engineering, University of Sydney from 1994 to 1996. He is currently an Associate
Professor at the Department of Computer Science and Engineering at the Chinese University of Hong Kong and the Director of
the Custom Computing Laboratory. He is the author of more than
fifty technical papers and three patents. His research interests include reconfigurable computing, digital systems, parallel computing, cryptography, and signal processing.
EURASIP Journal on Applied Signal Processing