Optimal Filter: 'Truncation Time For Matched Array Processing
Optimal Filter: 'Truncation Time For Matched Array Processing
Optimal Filter: 'Truncation Time For Matched Array Processing
Dwight E Macomber
In an enclosure, sound propagationfrom source to microphonecan be modeled as a transfer function. Thus, a sound signal s ( t ) captured by a microphone in an enclosurecan be expressed as:
m(t) = s ( t )
* h(t)
(1)
Microphone arrays proviide a means to selectively capture distant sound sources. Matched filter array (MFA) processing has proven to be a useful technique as an extension of time delay compensation (TDC) when the array operates in a reverberant environment 561.The MFA processling technique performs coherentadditions of direct and reflected signal energy in order to improve the signalto-noise ratio (SNR) of tihe desired signal over background noise. MFA processingconsists of convolving the captured signal set from the array with the corresponding set of time-reversed focusto-sensorresponses. These canbe obtainedfrom the image method [ 11 or measured using a lest signal [2,3]. In large or highly reverberant rooms, the focus-to-sensor responses can be quite long on the order of 1second. As a result, the computationalcomplexity of the algorithm, as well as system delay and subjective considerations, become factorsin determining systemfeasibility. Truncating the focus-to-sensorresponses used to construct the matched filters is an effective method for alleviating the effects of these problems. The length of the focus-to-sensor responses is a function of both enclosure geomeq and acoustic reflectivity. After the first arrivals, the sound intensity in a room decays approximatelyexponentially with time; hence the strongest reflections usually occur early in the focus-to-sensor response. As the later reflections are much weaker, they can be excluded from the matched filters without significantlyaffecting the performance of MFA processing. In addition, due to the nature of matched filter processing, the later reflections will cause an anticausal echo in the processed signal. Truncation of the MFA reduces this effect [ 6 ] .
WO& supported by NSlP G m t NO.MIP-9314625.
where h(t) is the filter corresponding to the model transfer function, and "*" denotes convolution. Impulse responses of Room Transfer Functions (RTF's) may be simply modeled (in continuous-time)as the sum of a set of impulses. The first impulse representsthe arrival of the direct wave. The progressivelydelayed and attenuatedimpulses which follow represent the muitiple reflections arriving at the sensor location. They are commonly referred to as reverberation. Thus, the acoustic source-to-sensor pressure response is of the form
j=1
where the corresponding sets of p j and ~j depend on the acoustic properties of the signal path between the sound sourceand the sensor. The MFA algorithm consists of filtering the input signals obtained from each microphone with the time reverse of the corresponding focus-to-sensorimpulse response. For a sound source located at the array focus, the effect of the matched filter is to convolve the undistorted signal with the autocorrelation of the focusto-sensor response:
yi(t) = m i ( t ) * h l ; ( - t ) = a ( t ) * h j j ( t ) * h j i ( - t )
(3)
where m ; ( t ) is the signal received at sensor i. A simple case for a single matched filter corresponding to an enclosure with two reflections is shown in Figure 1. The output of the MFA is the sum of the outputs of each individual matched filter. For a single sound source at the focal point, this is:
3629
Sensor Sigrurls
I
Beamformer
Matched Filter
K(K-I)
DimdAUivnl KReflsums
...
Figure 1: Effect of matched filtering in an enclosure.
...
ourpa
...1
1.0
YON(t)
= 2{80N(t)
i=l
* h f i ( t ) )* h f i ( - t )
hfi(t) * hfi(-t)
(4)
80N(t)
*
i=l
Figure 2: Alignment of captured signals using delay-and-sum beamforming and using MFA processing. rivals are subjectively unpleasant and become more easily perceptible as they occur further in advanceof the dominant arrival. With truncation, the length of early arrival lead time is significantly reduced, and the precursor is not perceptuallyprominentin the output signal. Use of matched filtering on each of the sensor signals in the MFA system provides spatial discrimination-signals arriving from the focus position will be enhanced relative to signals amving from other locations. The processing combines the direct and reverberant arrivals coherently and suppresses the arrivals correspondingto off-focus signals, which are combined incoherently. A conventionalmatched filterproduces a peak in the amplitude of its output waveform at some time instant following the arrival of the input signal to which it is matched [7]. The matching process maximizes the filters output energy (represented by the square of filters output amplitude peak) relative to the output noise energy due to the presence of stationary additive noise at the matched filters input. The output signal-to-noiseratio is considered optimum for this condition. The signal-to-noise criterion is different, however, for sound capture in a reverberant environment. In the following paragraphs, the SNR performance of MFA processing will be qualitatively evaluated for two general situations-full matched filtering for rooms with varying absorption, and matched filtering with truncated matched filters. The transfer function of a single, fully matched MFA channel is essentially the autocorrelation of the RTF for that sensor. The single RTF can be decomposedintothe direct arrival hd(t) and the successivearrivals of the reverberant tail hr (t):
hi(t) = h d ( t )
where 8 0 N (t)is the signaloriginatingfrom the focal region, h f i(t) is the impulseresponse from the focal point to microphonei , and N is the number of sensors. When N is sufficientlylarge, the summation term on the right side of (4) will approximatea large amplitude impulse and will enhance the component of desired signal 8 0 N ( t ) appearing at the microphones. A distinct advantage of the MFA over simple beamforming is its ability to suppress perceptual reverberation from the captured signal. In a reverberantenvironment,the performanceof the delaysum beamformer suffers as reflected sound waves appear along the bore of the beam. It is shown in [4] that the potential SNR for the MFA is independentof the number of reflections, as compared with beamforming, where SNR decreases monotonically with the number of reflections. The ratio of signal to reverberant energy for the MFA was shown to be
K-1)
K-1
In the above, K represents the number of reflections in the acoustical environment,and N represents the number of microphonesin the arrays. Principles of operation of the two arrays are shown in Figure 2. Sphericalspreading and the attenuationdue to absorption and propagation were ignored in derivations of Equations 5 and 6 and Figure 2. In practice, the source-to-sensor responses used for the MFA must be truncated. Otherwise, at an 8-kHz sampling rate, the 0 . 5 to 1.5-secondreverberation times typical of medium-sizedconference rooms would require processing each microphones output with an FIR filter several thousand taps long. Processing delay would also introduce an unnatural lag in teleconferencingapplications. Another problem caused by using full-length matched filters is the precursor (anticausal echo) in the output of the MFA. The onfocus systemimpulse responseof the MFA described in (4) has two long tails with a large impulse at the center, as shown schematically in Figure 2. The forward tail in the MFA impulse response generates a precursor of the desired signal at the output. These early ar-
+hr(t).
(7)
of the convolution of the signal with the sum of the respective correlations:
gi(t)
= s(t) * [4dd(t)
ddr(t)
$rd(t)
+ 4rr(t)]
(8)
a t p u t energy is the integral of y2 (t). The total output energy can be seen to contain products of the auto- and cross-correlationsof the direct and reverberant components of h i ( t ) . Signal energy in the output is due primarily to $ d d ( O ) and # r r ( O ) , while the reverberant noise energy is due to 4 d d ( t ) and 4 r r ( t ) , t # 0, along with the smaller cross-correlations4dr(t) and 4 r d ( t ) .
3630
Figure 3: Illustrationof the direct arrivaland 3 reflectionsfor sound propagation in a rectilinear enclosure
As the mom absorption is reduced and reverberation time of the room increases, the ainplitude of the reverberant tail h r ( t ) increases. This results in a significant increase in 4rr(O). With squaring, the output signid energy due to the peak in the MFA response increases dramatically. The increase in the noise component is less than for the signal due to the incoherence in the reverberant arrivals. Hence, thl: S N R improvementoffered by the MFA over the delay-sum beamformer increases as room reverberation times increase. In the case of truncated MFA processing, Q d d ( 0 ) is the only auto-correlation that contributes to energy of the desired signal in the MFA output-all other contributions come from crosscorrelations. The large signal contribution due to Qrr(0). the peak of the auto-correlationof hr(t), is lost. It is replaced by the crosscorrelationbetween the full and the truncated hr(t). As the early, and strongest,portionsof hr(t) are maintainedin the truncated version, the desired signal component in the MFA output due to this cross-correlationare reduced only slightly from drr(0).
1. Compute ideal transfer function h ( t ) to 9th-image order. 2. Compute a discrete-timeresponseusing a 8kHz lowpass filter. 3. Convolve with a 50Hz digital highpass filter. The resultantimpulse responseset { hi(n)}is assumed to comprise the discrete-time acoustical system response. To compute the truncated MFA response, each matched filter is convolved with a truncated version of itself, and then all the responsesare added together to compute the overall transfer function. Truncation length is varied from length of time to first arrival to the T60 time in 30 increments. No simply calculated measure for SNR that reflected the subjective improvement afforded by the MFA was known to the authors. Therefore, it was chosen to classify all output energy associated with the large peak in the arrays impulse response as signal. Room reverberation produces the tails in the auto-correlated RTFs seen at each sensors matched-filter output. The energy in the MFA output due to the resultant sum of all these reverberant tails was considered noise. SNR was computed as a function of the following parameter variations: Q 0.01, 0.1,0.2,0.4 10,32,100 # Microphones 3 x 4 ~ 56 , x 8 ~ 1 0 , Room size (meters) Small(S), Medium(M),Large(L) I 12 x 16 x 20
The source and sensors were randomly placed around the corresponding enclosure using a uniform distribution.
3. SIMULATIONS
The effect of truncation on matched filtering was studied with simulated RTFs, which wen: computed using the Allen and Berkley image method for rectangular enclosures[ 11 as shown in Figure 3. It is assumed in the modld that sound reflects from the enclosing walls at an incident angla equal and opposite to the arrival angle, and the amplitude of the ieflection is attenuated by a reflection coefficient p , which is related to a walls absorption coefficientQ by the relation a = 1 - pz. A multi-path transfer responsebetween an acoustic sourceand a transducer is illustrated in Figure 3, where images appear to be arriving from virtual sources located in image rooms. The amplitude of sound arriving from these sources is attenuatedby the usual 1 / propagation ~ factor, as well as the product of the reflection coefficients (8) associated with the set of reflection walls. The form of the resultant impulse rc:sponse is given in (2). Discrete sampling of the transfer function implies lowpass filtering of the response by an anti-aliasing filter. Sirce both practical transducers and typical enclosures attenuate very low frequencies (below 50 Hz), there is also a built-inphysical highpass filter associated with the transfer function. The computationof the transfer functions for a given enclosure and microphone configurationis as follows. For each microphone in the array:
defines the time by which the soundenergydensityin a room drops to 60 decibels below the initial sound energy density. reverberant interference only.
T 6 0
3631
# mics
I
I
Room Size
I
0.11
S
10
M L
Absomtion a 0.01 I 0.1 I 0.2 I 0.4 II 8.56 1 8.59 I 8.61 I 8.75 44 26 61 57 I1 8.68 I 8.70 I 8.72 I 8.91 128 111 102 59 8.73 8.79 8.86 9.04 124 231 207 248 11.79 11.93 12.09 12.29
11
11
II
4
32
L S
0.M
0
11
73 20 13.48 32 13.98
Table 1: Maximum MFA SNR (dB) - top number in block; Optimum truncation length (ms) - bottom number in block. strate that system SNR for reverberant noise can be made independent of enclosure characteristics. Simulation results are in good agreement with theoretical considerations, and predicted performance trends.
6. REFERENCES
. . . . . . .; . ...................................
:............:............. :
.....
Figure 5: S N R vs truncation length for various array configurations by greater growth in the main peak of the autocorrelation,the incremental benefit is less significant. Hence it is desirable to truncate the filter bank earlier for systems with a greater number of channels. A related observation is that the growth in SNR with the number of sensors is less than predicted by (5). According to (5), there should be a 5.2dB growth for every tripling of sensor count. Table 1 indicates an approximategrowth of 3.3dB with an increase from 10 to 32 sensors, and a growth of 1.5dB with the increase from 32 to lo0 sensors. This is again attributed to the partial coherence of individualchannel autocorrelationswith a correspondinggrowth in tail energy. It is predicted that SNR growth will be incrementally very small with sensor count increases when a very large number of sensors is used.
5. CONCLUSION
[I] J. B. Allen and D. A. Berkley. Image method for efficiently simulating small room acoustics. J. Acoust. Soc. Am., 65(4):943-950, April 1979. [2] N. Aoshima. Computer-generated pulse signal applied for sound measurement. J. Acous. Soc. Am, 69(5):1484-1488, May 1981. [3] J. Borish and J. Angel]. An efficient algorithm for measuring the impulseresponseusing pseudorandomnoise.J. Audio Eng. Soc., 31:478487,1993.
[4] J.L. Flanagan, A.C. Surendran, and E.E. Jan. Spatially selective sound capture for speech and audio processing. Speech Communication, 13:201-222,1993. [5] E. E. Jan. Parallel Processing of Large Scale MicrophoneArrays for Sound Capture. PhD thesis, Rutgers University,New Brunswick, NJ, May 1995. 161 R. J. Renomeron. Spatially selective sound capture for teleconferencing systems. Masters thesis, Rutgers University, New Brunswick, NJ, October 1997. [7] George L. Turin. An introduction to matched filters. IRE Transactions on Information Theory, IT-6(3):311-329, June 1960.
The MFA technique is shown to be effective in eliminating reverberant noise for a m y sound capture. Simulation results demon-
3632