New Insights Into The Noise Reduction Wiener Filter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

1218 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO.

4, JULY 2006

New Insights Into the Noise Reduction Wiener Filter


Jingdong Chen, Member, IEEE, Jacob Benesty, Senior Member, IEEE, Yiteng (Arden) Huang, Member, IEEE,
and Simon Doclo, Member, IEEE

Abstract—The problem of noise reduction has attracted a reduction/speech enhancement techniques in order to extract
considerable amount of research attention over the past several the desired speech signal from its corrupted observations.
decades. Among the numerous techniques that were developed, Noise reduction techniques have a broad range of applica-
the optimal Wiener filter can be considered as one of the most
fundamental noise reduction approaches, which has been de- tions, from hearing aids to cellular phones, voice-controlled sys-
lineated in different forms and adopted in various applications. tems, multiparty teleconferencing, and automatic speech recog-
Although it is not a secret that the Wiener filter may cause some nition (ASR) systems. The choice between using and not using
detrimental effects to the speech signal (appreciable or even sig- a noise reduction technique may have a significant impact on
nificant degradation in quality or intelligibility), few efforts have the functioning of these systems. In multiparty conferencing,
been reported to show the inherent relationship between noise
for example, the background noise picked up by the microphone
reduction and speech distortion. By defining a speech-distortion
index to measure the degree to which the speech signal is deformed at each point of the conference combines additively at the net-
and two noise-reduction factors to quantify the amount of noise work bridge with the noise signals from all other points. The
being attenuated, this paper studies the quantitative performance loudspeaker at each location of the conference therefore repro-
behavior of the Wiener filter in the context of noise reduction. We duces the combined sum of the noise processes from all other
show that in the single-channel case the a posteriori signal-to-noise locations. Clearly, this problem can be extremely serious if the
ratio (SNR) (defined after the Wiener filter) is greater than or
equal to the a priori SNR (defined before the Wiener filter),
number of conferees is large, and without noise reduction, com-
indicating that the Wiener filter is always able to achieve noise munication is almost impossible in this context.
reduction. However, the amount of noise reduction is in general Noise reduction is a very challenging and complex problem
proportional to the amount of speech degradation. This may seem due to several reasons. First of all, the nature and the character-
discouraging as we always expect an algorithm to have maximal istics of the noise signal change significantly from application
noise reduction without much speech distortion. Fortunately, to application, and moreover vary in time. It is therefore very
we show that speech distortion can be better managed in three
different ways. If we have some a priori knowledge (such as the difficult—if not impossible—to develop a versatile algorithm
linear prediction coefficients) of the clean speech signal, this a that works in diversified environments. Secondly, the objective
priori knowledge can be exploited to achieve noise reduction while of a noise reduction system is heavily dependent on the spe-
maintaining a low level of speech distortion. When no a priori cific context and application. In some scenarios, for example, we
knowledge is available, we can still achieve a better control of noise want to increase the intelligibility or improve the overall speech
reduction and speech distortion by properly manipulating the
Wiener filter, resulting in a suboptimal Wiener filter. In case that
perception quality, while in other scenarios, we expect to ame-
we have multiple microphone sensors, the multiple observations liorate the accuracy of an ASR system, or simply reduce the
of the speech signal can be used to reduce noise with less or even listeners’ fatigue. It is very hard to satisfy all objectives at the
no speech distortion. same time. In addition, the complex characteristics of speech
Index Terms—Microphone arrays, noise reduction, speech dis- and the broad spectrum of constraints make the problem even
tortion, Wiener filter. more complicated.
Research on noise reduction/speech enhancement can be
traced back to 40 years ago with 2 patents by Schroeder [1],
I. INTRODUCTION [2] where an analog implementation of the spectral magnitude
subtraction method was described. Since then it has become
S INCE we are living in a natural environment where noise
is inevitable and ubiquitous, speech signals are gener-
ally immersed in acoustic ambient noise and can seldom be
an area of active research. Over the past several decades,
researchers and engineers have approached this challenging
recorded in pure form. Therefore, it is essential for speech problem by exploiting different facets of the properties of the
processing and communication systems to apply effective noise speech and noise signals. Some good reviews of such efforts
can be found in [3]–[7]. Principally, the solutions to the problem
can be classified from the following points of view.
Manuscript received December 20, 2004; revised September 2, 2005. The
associate editor coordinating the review of this manuscript and approving it for • The number of channels available for enhancement; i.e.,
publication was Prof. Li Deng. single-channel and multichannel techniques.
J. Chen and Y. Huang are with the Bell Labs, Lucent Technologies, Murray • How the noise is mixed to the speech; i.e., additive noise,
Hill, NJ 07974 USA (e-mail: [email protected]; arden@re-
search.bell-labs.com).
multiplicative noise, and convolutional noise.
J. Benesty is with the Université du Québec, INRS-EMT, Montréal, QC, H5A • Statistical relationship between the noise and speech; i.e.,
1K6, Canada (e-mail: [email protected]). uncorrelated or even independent noise, and correlated
S. Doclo is with the Department of Electrical Engineering (ESAT-SCD), noise (such as echo and reverberation).
Katholieke Universiteit Leuven, Leuven 3001, Belgium (e-mail: simon.doclo@
esat.kuleuven.be). • How the processing is carried out; i.e., in the time domain
Digital Object Identifier 10.1109/TSA.2005.860851 or in the frequency domain.
1558-7916/$20.00 © 2006 IEEE
CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1219

In general, the more microphones are available, the easier the speech signal is deformed and two noise-reduction factors to
task of noise reduction. For example, when multiple realizations quantify the amount of noise being attenuated. We then show
of the signal can be accessed, beamforming, source separation, or that for the single-channel Wiener filter, the amount of noise re-
spatio-temporal filtering techniques can be applied to extract the duction is in general proportional to the amount of speech degra-
desired speech signal or to attenuate the unwanted noise [8]–[13]. dation, implying that when the noise reduction is maximized,
If we have two microphones, where the first microphone the speech distortion is maximized as well.
picks up the noisy signal, and the second microphone is able Depending on the nature of the application, some practical
to measure the noise field, we can use the second microphone noise-reduction systems require very high-quality speech, but
signal as a noise reference and eliminate the noise in the first can tolerate a certain amount of residual noise, whereas other
microphone by means of adaptive noise cancellation. However, systems require the speech signal to be as clean as possible,
in most situations, such as mobile communications, only one but may allow some degree of speech distortion. Therefore, it
microphone is available. In this case, noise reduction techniques is necessary that we have some management scheme to control
need to rely on assumptions about the speech and noise signals, the compromise between noise reduction and speech distortion
or need to exploit aspects of speech perception, speech produc- in the context of Wiener filtering. To this end, we discuss three
tion, or a speech model. A common assumption is that the noise approaches. The first approach leads to a suboptimal filter where
is additive and slowly varying, so that the noise characteristics a parameter is introduced to control the tradeoff between speech
estimated in the absence of speech can be used subsequently in distortion and noise reduction. The second approach leads to the
the presence of speech. If in reality this premise does not hold, well-known parametric-model-based noise reduction technique,
or only partially holds, the system will either have less noise where an AR model is exploited to achieve noise reduction,
reduction, or introduce more speech distortion. while maintaining a low level of speech distortion. The third
Even with the limitations outlined above, single-channel approach pertains to a multichannel approach where spatio-tem-
noise reduction has attracted a tremendous amount of re- poral filtering techniques are employed to obtain noise reduction
search attention because of its wide range of applications with less or even no speech distortion.
and relatively low cost. A variety of approaches have been
developed, including Wiener filter [3], [14]–[19], spectral II. ESTIMATION OF THE CLEAN SPEECH SAMPLES
or cepstral restoration [17], [20]–[27], signal subspace
We consider a zero-mean clean speech signal contami-
[28]–[35], parametric-model-based method [36]–[38], and
nated by a zero-mean noise process [white or colored but
statistical-model-based method [5], [39]–[46].
uncorrelated with ], so that the noisy speech signal at the
Most of these algorithms were developed independently of discrete time sample is
each other and generally their noise reduction performance was
evaluated by assessing the improvement of signal-to-noise ratio (1)
(SNR), subjective speech quality, or ASR performance (when
the ASR system is trained in clean conditions and additive noise Define the error signal between the clean speech sample at time
is the only distortion source). Almost with no exception, these and its estimate
algorithms achieve noise reduction by introducing some distor- (2)
tion to the speech signal. Some algorithms, such as the subspace
method, are even explicitly formulated based on the tradeoff be- where superscript denotes transpose of a vector or a matrix,
tween noise reduction and speech distortion. However, so far,
few efforts have been devoted to analyzing such a tradeoff be-
havior even though it is a very important issue. In this paper, we is an FIR filter of length , and
attempt to provide an analysis about the compromise between
noise reduction and speech distortion. On one hand, such a study
may offer us some insight into the range of existing algorithms is a vector containing the most recent samples of the observa-
that can be employed in practical noisy environments. On the tion signal .
other hand, a good understanding may help us to find new algo- We now can write the mean-square error (MSE) criterion
rithms that can work more effectively than the existing ones.
Since there are so many algorithms in the literature, it is ex- (3)
tremely difficult—if not impossible—to find a universal ana-
where denotes mathematical expectation. The optimal es-
lytical tool that can be applied to any algorithm. In this paper,
timate of the clean speech sample tends to contain
we choose the Wiener filter as the basis since it is one of the
less noise than the observation sample , and the optimal
most fundamental approaches, and many algorithms are closely
filter that forms is the Wiener filter which is obtained as
connected to this technique. For example, the minimum-mean-
follows:
square-error (MMSE) estimator presented in [21], which be-
longs to the category of spectral restoration, converges to the (4)
Wiener filter at a high SNR. In addition, it is widely known that
the Kalman filter is tightly related to the Wiener filter.
Consider the particular filter
Starting from optimal Wiener filtering theory, we introduce
a speech-distortion index to measure the degree to which the
1220 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

This means that the observed signal will pass this filter where has the same size as and consists of all zeros. The
unaltered (no noise reduction), thus the corresponding MSE is minimum MSE (MMSE) is
(15)

(5) We see clearly from the previous expression that


; therefore, noise reduction is possible.
In principle, for the optimal filter , we should have The normalized MMSE is
(6) (16)
In other words, the Wiener filter will be able to reduce the level
of noise in the noisy speech signal . and .
From (4), we easily find the Wiener–Hopf equation
III. ESTIMATION OF THE NOISE SAMPLES
(7)
In this section, we will estimate the noise samples from the
where observations . Define the error signal between the noise
sample at time and its estimate
(8)
(17)
is the correlation matrix of the observed signal and
where
(9)

is the cross-correlation vector between the noisy and clean


speech signals. However, is unobservable; as a result, an is an FIR filter of length . The MSE criterion associated with
estimation of may seem difficult to obtain. But (17) is

(18)

The estimation of in the MMSE sense will tend to attenuate


the clean speech.
(10) The minimization of (18) leads to the Wiener–Hopf equation
Now depends on the correlation vectors and . The vector
(which is also the first column of ) can be easily esti-
mated during speech and noise periods while can be esti- (19)
mated during noise-only intervals assuming that the statistics of
the noise do not change much with time. We have
Using (10) and the fact that , we obtain the (20)
optimal filter
(21)

The MSE for the particular filter (no clean speech reduc-
(11) tion) is

where (22)

Therefore, the MMSE and the normalized MMSE are, respec-


(12)
tively,

is the signal-to-noise ratio, is the identity matrix, and (23)

(24)

Since , the Wiener filter will be able to reduce


the level of the clean speech in the signal . As a result,
.
We have
In Section IV, we will see that while the normalized MMSE,
(13) , of the clean speech estimation plays a key role in noise
reduction, the normalized MMSE, , of the noise process
(14) estimation plays a key role in speech distortion.
CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1221

IV. IMPORTANT RELATIONSHIPS BETWEEN NOISE REDUCTION is feasible with the Wiener filter, expression (33) shows that the
AND SPEECH DISTORTION price to pay for this is also a reduction of the clean speech [by
Obviously, there are some important relationships between a quantity equal to and this implies distor-
the estimation of the clean speech and noise samples. From (11) tion], since . In other words, the power of the at-
and (19), we get a relation between the two optimal filters tenuated clean speech signal is, obviously, always smaller than
the power of the clean speech itself; this means that parts of the
(25) clean speech are attenuated in the process and as a result, dis-
tortion is unavoidable with this approach.
In fact, minimizing or with respect to is We now define the speech-distortion index due to the optimal
equivalent. In the same manner, minimizing or filtering operation as
with respect to is the same thing. At the optimum, we have

(26)

From (15) and (23), we see that the two MMSEs are equal (34)

(27) Clearly, this index is always between 0 and 1 for the optimal
filter. Also
However, the normalized MMSE’s are not, in general. Indeed,
we have a relation between the two (35)
(36)

So when is close to 1, the speech signal is highly dis-


(28) torted and when is near 0, the speech signal is lowly
distorted. We deduce that for low SNRs, the Wiener filter can
So the only situation where the two normalized MMSE’s are have a disastrous effect on the speech signal.
equal is when the SNR is equal to 1. For , Similarly, we define the noise-reduction factor due to the
and for , . Also, Wiener filter as
and .
It can easily be verified that

(29)

which implies that . We already know that


and .
The optimal estimation of the clean speech, in the Wiener (37)
sense, is in fact what we call noise reduction
and . The greater is , the more noise reduc-
(30)
tion we have. Also
or equivalently, if the noise is estimated first
(38)
(31) (39)
we can use this estimate to reduce the noise from the observed
Using (34) and (37), we obtain important relations between the
signal
speech-distortion index and the noise-reduction factor
(32)
(40)
The power of the estimated clean speech signal with the optimal
Wiener filter is (41)

(33) Therefore, for the optimum filter, when the SNR is very large,
there is little speech distortion and little noise reduction (which
which is the sum of two terms. The first one is the power of the is not really needed in this situation). On the other hand, when
attenuated clean speech and the second one is the power of the the SNR is very small, speech distortion is large as well as noise
residual noise (always greater than zero). While noise reduction reduction.
1222 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

V. PARTICULAR CASE: WHITE GAUSSIAN NOISE


In this section, we assume that the additive noise is white, so
that,

(47)

From (16) and (24), we observe that the two normalized MMSEs
are
(48)
(49)
where and are the first components of the vectors
and , respectively. Clearly, and .
Hence, the normalized MMSE is completely governed
by the first element of the Wiener filter .
Now, the speech-distortion index and the noise-reduction
factor for the optimal filter can be simplified
Fig. 1. Illustration of the areas where  (h ) and  (g ) take their values
as a function of the SNR.  (h ) can take any value above the solid line while
(50)
 (g ) can take any value under the dotted line.

Another way to examine the noise-reduction performance is


to inspect the SNR improvement. Let us define the a posteriori (51)
SNR, after noise reduction with the Wiener filter as
We also deduce from (50) that and .
We know from linear prediction theory that [47]

(52)

where is the forward linear predictor and is the corre-


sponding error energy. Replacing the previous equation in (11),
we obtain
(42)
(53)
It can be shown that the a posteriori SNR and the a priori
SNR satisfy (see Appendix), indicating that the where
Wiener filter is always able to improve the SNR of the noisy
speech signal. (54)
Knowing that , we can now give the lower
bound for . As a matter of fact, it follows from (42) that Equation (53) shows how the Wiener filter is related to the for-
ward predictor of the observed signal . This expression also
(43) gives a hint on how to choose the length of the optimal filter :
it should be equal to the length of the predictor required to
Since , and , it can be easily have a good prediction of the observed signal . Equation
shown that (54) contains some very interesting information. Indeed, if the
clean speech signal is completely predictable, this means that
(44) and . On the other hand, if is not
predictable, we have and . This
Similarly, we can derive the upper bound for , i.e., implies that the Wiener filter is more efficient to reduce the level
of noise for predictable signals than for unpredictable ones.
(45)

Fig. 1 illustrates expressions (44) and (45). VI. BETTER WAYS TO MANAGE NOISE REDUCTION
AND SPEECH DISTORTION
We now introduce another index for noise reduction
For a noise-reduction/speech-enhancement system, we al-
(46)
ways expect that it can achieve maximal noise reduction without
The closer is to 1, the more noise reduction we get. This much speech distortion. From the previous section, however,
index will be helpful to use in Sections V–VII. it follows that while noise reduction is maximized with the
CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1223

optimal Wiener filter, speech distortion is also maximized. In order to have less distortion with the suboptimal filter
One may ask the legitimate question: are there better ways than with the Wiener filter , we must find in such a way that
to control the tradeoff between the conflicting requirements
of noise reduction and speech distortion? Examining (34), (62)
one can see that to control the speech distortion, we need to hence, the condition on should be
minimize . This can be achieved in
different ways. For example, a speech signal can be modeled as (63)
an AR process. If the AR coefficients are known a priori or can Finally, the suboptimal filter can reduce the level of noise of
be estimated from the noisy speech, these coefficients can be the observed signal but with less distortion than the Wiener
exploited to minimize , while simulta- filter if is taken such as
neously achieving a reasonable level of noise attenuation. This
is often referred to as the parametric-model-based technique (64)
[36], [37]. We will not discuss the details of this technique here. For the extreme cases and we obtain respectively
Instead, in what follows we will discuss two other approaches , no noise reduction at all but no additional distortion
to manage noise reduction and speech distortion in a better way. added, and , maximum noise reduction with maximum
speech distortion.
A. A Suboptimal Filter Since
Consider the suboptimal filter

(55)
(65)
where is a real number. The MSE of the clean speech estima-
tion corresponding to is it follows immediately that the speech-distortion index and the
noise-reduction factor due to are

(56) (66)

and, obviously, , ; we have equality for (67)


. In order to have noise reduction, must be chosen in
such a way that , therefore
From (61), one can see that , which is
(57) a function of only. Unlike ,
does not only depend on , but on the characteristics of both the
We can check that speech and noise signal as well.
However, using (56) and (15), we find that
(58)

Let (68)

(59) Fig. 2 plots and , both as a


function of . We can see that when , the suboptimal
denote the estimation of the clean speech at time with respect filter achieves of the noise reduction with the Wiener filter,
to . The power of is while the speech distortion is only 49% of that of the Wiener
filter. In real applications, we may want the system to achieve
maximal noise reduction, while keeping the speech distortion
as low as possible. If we define a cost function to measure the
compromise between the noise reduction and the speech distor-
(60) tion as
The speech-distortion index corresponding to the filter is

(69)
It is trivial to see that the that maximizes is
(61)
(70)
The previous expression shows that the ratio of the speech-
distortion indices corresponding to the two filters and In this case, the suboptimal filter achieves 75% of the noise
depends on only. reduction with the Wiener filter, while the speech-distortion is
1224 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

both the signal and the noise are assumed to be Gaussian random
processes and . This figure shows that for the same ,
decreases with SNR, indicating that the higher the SNR,
the better the suboptimal filter is able to control the compromise
between noise reduction and speech distortion.
In order for the suboptimal filter to be able to control the
tradeoff between noise reduction and speech distortion,
should be chosen in such a way that
. Therefore, should satisfy .
From Fig. 3, we notice that is always positive if the
SNR is above 1 (0 dB). When the SNR drops below 1 (0 dB),
however, may become negative, indicating that the
suboptimal filter cannot work reliably in very noisy conditions
[when dB ].
Fig. 2.  (g )= (g ) (dashed line) and  (h )= (h ) (solid line),
both as a function of . Fig. 3 also shows the that maximizes in different
SNR situations. It is interesting to see that the approaches to
1 when dB , which means that the suboptimal
filter converges to the Wiener filter in very low SNR condi-
tions. As we increase the SNR, the begins to decrease. It
goes to 0 when SNR is increased to 1000 (30 dB). This is un-
derstandable. When the SNR is very high, the speech signal is
already very clean, so filtering is not really needed. By searching
the that maximizes (71), the system can adaptively achieve
the best tradeoff between noise reduction and speech distortion
according to the characteristics of both the speech and noise
signals.

B. Noise Reduction With Multiple Microphones


In more and more applications, multiple microphone signals
are available. Therefore, it is interesting to investigate deeply the
Fig. 3. Illustration of J ( ) in different SNR conditions, where both the multichannel case, where various techniques such beamforming
signal and the noise are assumed to be Gaussian random processes, and = (nonadaptive and adaptive) and spatial-temporal filtering can be
0:7. The “ ” symbol in each curve represents the maximum of J ( ) in the
corresponding condition.
used to achieve noise reduction [13], [50]–[52]. One of the first
papers to do so is a paper written by Doclo and Moonen [13],
where the optimal filter is derived as well as a general class
only 25% of that of the Wiener filter. The parameter , which
of estimators. The authors also show how the generalized sin-
is optimal in terms of the tradeoff between noise reduction and
gular value decomposition can be used in this spatio-temporal
speech distortion, can be used as a guidance in designing a prac-
technique. In this section, we take a slightly different approach.
tical noise reduction system for applications like ASR.
We will see, in particular, that we can reduce the level of noise
Another way to obtain an optimal is to define a dis-
without distorting the speech signal.
criminative cost function between and
We suppose that we have a linear array consisting of
, i.e.,
microphones whose outputs are denoted as ,
. Without loss of generality, we se-
lect microphone 0 as the reference point and to simplify the
analysis, we consider the following propagation model:

(71) (72)

where is an application-dependent constant and determines where is the attenuation factor (with ), is the prop-
the relative importance between the improvement in speech dis- agation time from the unknown speech source to micro-
tortion and degradation in noise reduction (e.g., in hearing aid phone 0, is an additive noise signal at the th micro-
applications we may tune this parameter using subjective intel- phone, and is the relative delay between microphones 0 and
ligibility tests). , with .
In contrast to , which is a function of only, the cost In the following, we assume that the relative delays ,
function does not only depend on , but on the char- , are known or can easily be estimated. So our first
acteristics of the speech and noise signal as well. Fig. 3 plots step is the design of a simple delay-and-sum beamformer, which
as a function of in different SNR conditions, where spatially aligns the microphone signals to the direction of the
CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1225

speech source. From now on, we will work on the time-aligned Usually, in the single-channel case, the minimization of the
signals MSE corresponding to the residual noise is done while keeping
the signal distortion below a threshold [28]. With no distortion,
the optimal filter obtained from this optimization is , hence
there is not any noise reduction either. The advantage of mul-
tiple microphones is that, actually, we can minimize
(73) with the constraint that (no speech distortion at
all). Therefore, our optimization problem is
A straightforward approach for noise reduction is to average
(78)
the signals
By using a Lagrange multiplier, we easily find the optimal so-
lution

(79)
(74)
where we assumed that the noise signals are not perfectly
coherent so that is not singular. This result is very similar
where . If the noises are added incoherently, to the linearly constrained minimum variance (LCMV) beam-
the output SNR will, in principle, increase [48]. We can further former [51], [52]; but in (79) additional attenuation factors
reduce the noise by passing the signal through a Wiener have been included. Note also that this formula has been derived
filter as was shown in the previous sections. This approach has, from a different point of view as a multichannel extension of a
however, two drawbacks. The first one is that, since for , single-channel MMSE noise-reduction algorithm.
in general, the output SNR will Given the optimal filter , we can write the MMSE for
not improve that much; and the second one, as we know already, the th microphone as
is speech distortion introduced by the optimal filter.
Let us now define the error signal, for the th microphone, (80)
between the clean speech sample and its estimate as Since we have microphones, we have MMSEs as well.
The best MMSE from a noise reduction point of view is the
smallest one, which is, according to (80), the microphone signal
with the smallest attenuation factor.
(75)
The attenuation factors can be easily determined, if the
power of the noise signals is known, by using the formula
where are filters of length and

(81)
Since , (75) becomes For the particular case where the noise is spatio-temporally
white with a power equal to , the MMSE and the normalized
MMSE for the th microphone are, respectively,

(82)

(83)
(76)
where As in the single-channel case, we can define for the th mi-
crophone the speech-distortion index as

Expression (76) is the difference between two error signals; (84)


represents signal distortion and represents the
residual noise. The MSE corresponding to the residual noise and the noise-reduction factors as
with the th microphone as the reference signal is
(85)

(77) (86)
1226 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

With the optimal filter given in (79), for the particular case
where the noise is spatio-temporally white with a power equal
to , it can be easily shown that

and

It can be seen that when the number of microphones goes to in-


finity, and approach, respectively, to in-
finity and 1, and meanwhile , which indicates
that the noise can be completely removed with no signal distor-
tion at all.

VII. SIMULATION EXPERIMENTS Fig. 4. Noise and its estimate. The first trace (from the top) shows the
waveform of a speech signal corrupted by a car noise where SNR = 10 (10 dB).
By defining a speech-distortion index to measure the degree The second and third traces plot the waveform and spectrogram of the noise
to which the speech signal is deformed and two noise-reduction signal. The fourth and fifth traces display the waveform and spectrogram of the
noise estimate.
factors to quantify the amount of noise being attenuated, we
have analytically examined the performance behavior of the
Wiener-filter-based noise reduction technique. It is shown that at the bottom of the page), where again is the “attack”
the Wiener filter achieves noise reduction by distorting the coefficient and the “decay” coefficient. To further reduce the
speech signal. The more the noise is reduced, the more the spectral fluctuation, both and are averaged across
speech is distorted. We also proposed several approaches to the neighboring frequency bins around . Finally, an estimate
better manage the tradeoff between noise reduction and speech of the noise spectrum is obtained by multiplying
distortion. To further verify the analysis, and to assess the with , and the time-domain noise signal is obtained
noise-reduction-and-speech-distortion management schemes, through IDFT and the overlap-add technique. See [6], [7] for a
we implemented a time-domain Wiener-filter system. The more detailed description of this noise-estimation scheme.
sampling rate is 8 kHz. The noise signal is estimated in the Fig. 4 shows a speech signal corrupted by a car noise
time-frequency domain using a sequential algorithm presented dB , the waveform and the spectrogram of the car noise
in [6], [7]. Briefly, this algorithm obtains an estimate of noise that is added to the speech, and the waveform and spectrogram
using the overlap-add technique on a frame-by-frame basis. The of the noise estimate. It can be seen that during the absence of
noisy speech signal is segmented into frames with a frame speech, the estimate is a good approximation of the noise signal.
width of 8 ms and an overlapping factor of 75%. Each frame It is also noticed from its spectrogram that the noise estimate
is then transformed via a DFT into a block of spectral samples. consists of some minor speech components during the presence
Successive blocks of spectral samples form a two-dimensional of speech. Our listening test, however, shows that the residual
time-frequency matrix denoted by , where subscript speech in the noise estimate is almost inaudible. An apparent
is the frame index, denoting the time dimension, and is the advantage of this noise-estimation technique is that it does not
angular frequency. Then an estimate of the magnitude of the require an explicit voice activity detector. In addition, our exper-
noise spectrum is formulated as shown in (87) at the bottom imental investigation reveals that such a scheme is able to cap-
of the page, where and are the “attack” and “decay” ture the noise characteristics in both the presence and absence
coefficients respectively. Meanwhile, to reduce its temporal of speech, therefore it does not rely on the assumption that the
fluctuation, the magnitude of the noisy speech spectrum is noise characteristics in the presence of speech stay the same as
smoothed according to the following recursion (see (88), shown in the absence of speech.

if (87)
if

if
(88)
if
CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1227

Fig. 5. Noise-reduction factor and signal-distortion index, both as a function of the filter length: (a) noise reduction and (b) signal distortion. The source is a
signal recorded in a NYSE room; the background noise is a computer-generated white Gaussian random process; and SNR = 10 (10 dB).

Fig. 6. Noise-reduction factor and signal-distortion index, both as a function of the filter length: (a) noise reduction and (b) speech distortion. The source signal
is an /i:/ sound from a female speaker; the background noise is a computer-generated white Gaussian process; and SNR = 10 (10 dB).

Based on the implemented system, we evaluate the Wiener can be modeled as an AR process, where its current value can
filter for noise reduction. The first experiment investigates the be predicted from its past samples. To simplify the situation for
influence of the filter length on the noise reduction perfor- the ease of analysis, the source signal used here is an /i:/ sound
mance. Instead of using the estimated noise, here we assume recorded from a female speaker. Similarly as in the previous case,
that the noise signal is known a priori. Therefore, this ex- the background noise is a computer-generated white Gaussian
periment demonstrates the upper limit of the performance random process. The results are plotted in Fig. 6. Again, the
of the Wiener filter. We consider two cases. In the first one, noise-reduction factor, which quantifies the amount of noise
both the source signal and the background noise are random being attenuated, increases monotonically with the filter length;
processes in which the current value of the signal cannot be but unlike the previous case, the relationship between the noise
predicted from its past samples. The source signal is a noise reduction and the filter length is not linear. Instead, the curve
signal recorded from a New York Stock Exchange (NYSE) at first grows quickly as the filter length is increased up to 10,
room. This signal consists of sound from various sources such and then continues to grow but with a slower rate. Unlike ,
as speakers, telephone rings, electric fans, etc. The background the speech-distortion index, i.e., , exhibits a nonmonotonic
noise is a computer-generated Gaussian random process. The relationship with the filter length. It first decreases to its min-
results for this case are graphically portrayed in Fig. 5. It can imum, and then increases again as the filter length is increased.
be seen that both the noise-reduction factor and the The reason, as we have explained in Section V, is that a speech
speech-distortion index increase linearly with the filter length. signal can be modeled as an AR process. Particular to this ex-
Therefore, a longer filter should be applied for more noise periment, the /i:/ sound used here can be well modeled with a
reduction. However, the more the noise is attenuated, the more sixth-order LPC (linear prediction coding) analysis. Therefore,
the source signal is deformed, as shown in Fig. 5. when the filter length is increased to 6, the numerator of (34)
In the second case, we test the Wiener filter for noise reduction is minimized, as a result, the speech-distortion index reaches
in the context of speech signals. It is known that a speech signal its minimum. Continuing to increase the filter length leads to a
1228 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

Fig. 8. Noise reduction in a car noise condition (same speech and noise signals
as in Fig. 7) where SNR = 1 (0 dB): (a) noisy speech and its spectrogram and
(b) noise reduced speech and its spectrogram.

The second experiment tests the noise reduction performance


in different SNR conditions. Here the speech signal is recorded
from a female speaker as shown in Fig. 7. The computer-gen-
erated random Gaussian noise is added to the speech signal to
control the SNR. The length of the Wiener filter is set to .
The results are presented in Fig. 9, where besides and ,
we also plotted the Itakura–Saito (IS) distance, a widely used
objective quality measure that performs a comparison of spec-
tral envelopes (AR parameters) between the clean and the pro-
cessed speech [53]. Studies have shown that the IS measure
Fig. 7. Noise reduction in a car noise condition where SNR = 10 (10 dB): is highly correlated (0.59) with subjective quality judgements
(a) clean speech and its spectrogram; (b) noisy speech and its spectrogram; and [54]. A recent report reveals that the difference in mean opinion
(c) noise reduced speech and its spectrogram.
score (MOS) between two processed speech signals would be
less than 1.6 if their IS measure is less than 0.5 for various
higher distortion due to more noise reduction. To further verify codecs [55]. Many other reported experiments confirmed that
this observation, we investigated several other vowels, and found two spectra would be perceptually nearly identical if their IS dis-
that the curve of versus filter length follows a similar shape, tance is less than 0.1. All this evidence indicates that the IS dis-
except that the minimum may appear in a slightly different loca- tance is a reasonably good objective measure of speech quality.
tion. Taking into account the sounds other than vowels in speech As SNR decreases, the observation signal becomes more
that may be less predicable, we find that good performance with noisy. Therefore, the Wiener filter is expected to have more
the Wiener filter (in terms of the compromise between noise noise reduction for low SNRs. This is verified by Fig. 9(a),
reduction and speech distortion) can be achieved when the filter where significant noise reduction is obtained for low SNR
length is chosen around 20. Figs. 7 and 8 plot, respectively, conditions. However, more noise reduction would correspond
the outputs of our Wiener filter system for dB to more speech distortion. This is confirmed by Fig. 9(b) and
and dB , where the speech signal is from a female (d) where both the speech-distortion index and the IS distance
speaker, the background noise is a car noise signal, and . increase as speech becomes more noisy. Comparing the IS
CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1229

Fig. 9. Noise reduction performance as a function of SNR in white Gaussian noise: (a) noise-reduction factor; (b) speech-distortion index; (c) Itakura–Saito
distance between the clean and noisy speeches; and (d) Itakura–Saito distance between the clean and noise-reduced speeches.

TABLE I
NOISE REDUCTION PERFORMANCE WITH THE SUBOPTIMAL FILTER, WHERE ISD
IS THE IS DISTANCE BETWEEN THE CLEAN SPEECH AND THE FILTERED
VERSION OF THE CLEAN SPEECH, WHICH PURELY MEASURES THE SPEECH DISTORTION DUE TO THE FILTERING EFFECT; ISD
IS THE IS DISTANCE BETWEEN
THE CLEAN AND NOISE-REDUCED SPEECHES; ISD
IS THE IS DISTANCE BETWEEN THE CLEAN AND NOISY SPEECH SIGNALS

distance before [Fig. 9(c)] and after [Fig. 9(d)] noise reduction, comparison, besides the speech-distortion index and the noise-
one can see that significant gain in the IS distance has been reduction factor, we also show three IS distances (between the
achieved, indicating that the Wiener filter is able to reduce clean and filtered speech signals denoted as
noise and improve speech quality (but not necessarily speech , between the clean and noise-reduced speech
intelligibility). signals marked as , and between the clean and noisy signals
The third experiment is to verify the performance behavior of denoted as , respectively).
the suboptimal filter derived in Section VI-A. The experimental One can see that the IS distance between the clean and noisy
conditions are the same as outlined in the previous experiment. speech signals increases as SNR drops. The reason for this is ap-
The results are presented in Table I, where for the purpose of parent. When SNR decreases, the speech signal becomes more
1230 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

Fig. 10. Noise-reduction factor and signal-distortion index, both as a function of the number of microphone sensor: (a) noise reduction; (b) speech distortion. The
source signal is a speech from a female speaker as shown in Fig. 7; the background noise is a computer-generated white Gaussian process; and SNR = 10 (10 dB).

noisy. As a result, the difference between the spectral envelope for this experiments are graphically portrayed in Fig. 10. It can
(or AR parameters) of the clean speech and that (or those) of be seen that the noise-reduction index increases linearly with
the noisy speech tends to be more significant, which leads to the number of microphones, while the speech distortion is ap-
a higher IS distance. It is noticed that is much smaller proximately 0. Comparing Fig. 10 with 9, one can see that in the
than . This significant gain in IS distance indicates that condition where dB , the multichannel optimal
the use of noise reduction technique is able to mitigate noise filter with 4 sensors achieves a noise reduction similar to the op-
and improve speech quality. Comparing the results from both timal single-channel Wiener filter, but with no speech distortion,
the Wiener and the suboptimal Wiener filters, we can see that a which shows the advantage of using multiple microphones.
better compromise between noise reduction and speech distor-
tion is accomplished by using the suboptimal filter. For example,
VIII. CONCLUSION
when dB , the suboptimal filter with
has achieved a noise reduction of 2.0106, which is 82% of that The problem of speech enhancement has attracted a consider-
with the Wiener filter; but its speech-distortion index is 0.0006, able amount of research attention over the past several decades.
which is only 54% of that of the Wiener filter; the corresponding Among the numerous techniques that were developed, the op-
IS distance between the clean and filtered speech is 0.0281, timal Wiener filter can be considered as one of the most funda-
which is only 17% of that of the Wiener filter. From the anal- mental noise-reduction approaches. It is widely known that the
ysis shown in Section VI-A, we know that both Wiener filter achieves noise reduction by deforming the speech
and are independent of SNR. This can be easily signal. However, so far not much has been said on how the
verified from Table I. However, it is noted that Wiener filter really works. In this paper we analyzed the inherent
decreases with SNR, which may indicate that the suboptimal relationship between noise reduction and speech distortion with
filter works more efficiently for higher SNR than for lower SNR the Wiener filter. Starting from the speech and noise estima-
conditions. tion using the Wiener theory, we introduced a speech-distortion
The last experiment is to investigate the performance of the index and two noise-reduction factors, and showed that for the
multichannel optimal filter given in (79). Since the focus of single-channel Wiener filter, the amount of noise attenuation is
this paper is on reduction of additive noise, the reverberation in general proportional to the amount of speech degradation, i.e.,
effect is not considered here. To simplify the analysis, we as- more noise reduction incurs more speech distortion.
sume that we have an equispaced linear array, which consists of Depending on the nature of the application, some practical
ten microphone sensors. The spacing between adjacent micro- noise-reduction systems may require very high-quality speech,
phones is cm. There is only a single speech source (a but can tolerate a certain amount of noise. While other systems
speech signal from a female speaker) propagating from the far may want speech as clean as possible even with some degree of
field to the array with an incident angle (the angle between the speech distortion. Therefore, it is necessary to have some man-
wavefront and the line joining the sensors in the linear array) agement schemes to control the contradicting requirements be-
of . We further assume that all the microphone sen- tween noise reduction and speech distortion. To do so, we have
sors have the same signal and noise power. The sampling rate discussed three approaches. If we know the linear prediction co-
is 16 kHz. For the experiment, we choose Microphone 0 as the efficients of the clean speech signal or they can be estimated
reference sensor, and synchronize the observation signals ac- from the noisy speech, these coefficients can be employed to
cording to the time-difference-of-arrival (TDOA) information achieve noise reduction while maintaining a low level of speech
estimated using the algorithm presented in [56]. We then pass distortion. When no a priori knowledge is available, we can
the time-aligned observation signals through the optimal filter use a suboptimal filter in which a free parameter is introduced
given in (79) to extract the desired speech signal. The results to control the compromise between noise reduction and speech
CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1231

distortion. By setting the free parameter to 0.7, we showed that where


the suboptimal filter can achieve 90% of the noise reduction
compared to the Wiener filter; but the resulting speech distor-
tion is less than half compared to the Wiener filter. In case that
we have multiple microphone sensors, the multiple observations
of the speech signal can be used to reduce noise with less or even .. ..
no speech distortion. . .

APPENDIX and
RELATIONSHIP BETWEEN THE A PRIORI
AND THE A POSTERIORI SNR

Theorem: With the Wiener filter in the context of noise re-


duction, the a priori SNR given in (12) and the a posteriori SNR
defined in (42) satisfy .. ..
. .
(89)
Proof: From their definitions, we know that all three ma- are two diagonal matrices. If for the ease of expression we de-
trices, , , and are symmetric, and positive semi-defi- note as , then both SNR and can
nite. We further assume that is positive definite so its inverse be rewritten as
exists. In addition, based on the independence assumption be-
tween the speech signal and noise, we have .
In case that both and are diagonal matrices, or is a
scaled version of (i.e., ), it can be easily
seen that . Here, we consider more complicated
situations where at least one of the and matrices is not di-
agonal. In this case, according to [49], there exists a linear trans- (95)
formation that can simultaneously diagonalize , , and .
The process is done as follows.
Since , ,
, and all are nonnegative numbers, as
long as we can show that the inequality
(90)
where again is the identity matrix (96)

holds, then . Now we prove this inequality by


(91) way of induction.
.. ..
. . • Basic Step: If ,

is the eigenvalue matrix of , with


, is the eigenvector matrix of , and
(92)
Note that is not necessarily orthogonal since is not Since , it is trivial to show that
necessarily symmetric. Then from the definition of SNR and
, we immediately have
where “ ” holds when . Therefore
(93)

and

(94)
1232 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

so the property is true for , where “ ” holds REFERENCES


when any one of and is equal to 0 (note that
[1] M. R. Schroeder, “Apparatus for supressing noise and distortion in com-
and cannot be zero at the same time since is munication signals,” U.S. Patent 3 180 936, Apr., 27 1965.
invertible) or when . [2] , “Processing of communication signals to reduce effects of noise,”
• Inductive Step: Assume that the property is true for U.S. Patent 3 403 224, Sep., 24 1968.
, i.e., [3] J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compres-
sion of noisy speech,” Proc. IEEE, vol. 67, no. 12, pp. 1586–1604, Dec.
1979.
[4] J. S. Lim, Speech Enhancement. Englewood Cliffs, NJ: Prentice-Hall,
1983.
[5] Y. Ephraim, “Statistical-model-based speech enhancement systems,”
Proc. IEEE, vol. 80, no. 10, pp. 1526–1554, Oct. 1992.
[6] E. J. Diethorn, “Subband noise reduction methods for speech enhance-
We must prove that it is also true for . As a ment,” in Audio Signal Processing for Next-Generation Multimedia
matter of fact Communication Systems, Y. Huang and J. Benesty, Eds. Boston, MA:
Kluwer, 2004, pp. 91–115.
[7] J. Chen, Y. Huang, and J. Benesty, “Filtering techniques for noise reduc-
tion and speech enhancement,” in Adaptive Signal Processing: Applica-
tions to Real-World Problems, J. Benesty and Y. Huang, Eds. Berlin,
Germany: Springer, 2003, pp. 129–154.
[8] S. Gannot, D. Burshtein, and E. Weinstein, “Signal enhancement using
beamforming and nonstationarity with applications to speech,” IEEE
Trans. Signal Process., vol. 49, no. 8, pp. 1614–1626, Aug. 2001.
[9] S. E. Nordholm, I. Claesson, and N. Grbic, “Performance limits in sub-
band beamforming,” IEEE Trans. Speech Audio Process., vol. 11, no. 3,
pp. 193–203, May 2003.
[10] F. Asano, S. Hayamizu, T. Yamada, and S. Nakamura, “Speech en-
hancement based on the subspace method,” IEEE Trans. Speech Audio
Process., vol. 8, no. 5, pp. 497–507, Sep. 2000.
[11] F. Jabloun and B. Champagne, “A multi-microphone signal subspace
approach for speech enhancement,” in Proc. IEEE ICASSP, 2001, pp.
205–208.
(97) [12] M. Brandstein and D. Ward, Eds., Microphone Arrays: Signal Pro-
cessing Techniques and Applications. Berlin, Germany: Springer,
2001.
[13] S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and
Using the induction hypothesis, and also the fact that multimicrophone speech enhancement,” IEEE Trans. Signal Process.,
vol. 50, no. 9, pp. 2230–2244, Sep. 2002.
[14] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood
Cliffs, NJ: Prentice-Hall, 1985.
[15] S. F. Boll, “Suppression of acoustic noise in speech using spectral sub-
traction,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27,
no. 2, pp. 113–120, Apr. 1979.
hence [16] R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft-
decision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal
Process., vol. ASSP-28, no. 2, pp. 137–145, Apr. 1980.
[17] P. Vary, “Noise suppression by spectral magnitude estimation-mecha-
nism and theoretical limits,” Signal Process., vol. 8, pp. 387–400, Jul.
1985.
[18] R. Martin, “Noise power spectral density estimation based on optimal
smoothing and minimum statistics,” IEEE Trans. Speech Audio Process.,
vol. 9, no. 5, pp. 504–512, Jul. 2001.
[19] W. Etter and G. S. Moschytz, “Noise reduction by noise-adaptive spec-
tral magnitude expansion,” J. Audio Eng. Soc., vol. 42, pp. 341–349,
May 1994.
[20] D. L. Wang and J. S. Lim, “The unimportance of phase in speech
enhancement,” IEEE Trans. Acoust., Speech, Signal Process., vol.
ASSP-30, no. 4, pp. 679–681, Aug. 1982.
(98) [21] Y. Ephraim and D. Malah, “Speech enhancement using a minimum-
mean square error short-time spectral amplitude estimator,” IEEE Trans.
Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121,
Dec. 1984.
where “ ” holds when all the ’s corresponding to nonzero [22] , “Speech enhancement using a minimum mean-square error
are equal, where . That completes the log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal
proof. Even though it can improve the SNR, the Wiener filter Process., vol. ASSP-33, no. 2, pp. 443–445, Apr. 1985.
[23] N. Virag, “Single channel speech enhancement based on masking prop-
does not maximize the a posteriori SNR. As a matter of fact, erties of human auditory system,” IEEE Trans. Speech Audio Process.,
(42) is well known as the generalized Rayleigh quotient. So vol. 7, no. 2, pp. 126–137, Mar. 1999.
the filter that maximizes the a posteriori SNR is the eigen- [24] Y. M. Chang and D. O’Shaughnessy, “Speech enhancement based con-
ceptually on auditory evidence,” IEEE Trans. Signal Process., vol. 39,
vector corresponding to the maximum eigenvalue of the matrix
no. 9, pp. 1943–1954, Sep. 1991.
. However, this filter typically gives rise to large speech [25] T. F. Quatieri and R. B. Dunn, “Speech enhancement based on auditory
distortion. spectral change,” in Proc. IEEE ICASSP, vol. 1, May 2002, pp. 257–260.
CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1233

[26] L. Deng, J. Droppo, and A. Acero, “Estimation cepstrum of speech under [52] H. Cox, R. M. Zeskind, and M. M. Owen, “Robust adaptive beam-
the presence of noise using a joint prior of static and dynamic features,” forming,” IEEE Trans. Acoust., Speech, Signal Process., vol. 35, no.
IEEE Trans. Speech Audio Process., vol. 12, no. 3, pp. 218–233, May 10, pp. 1365–1375, Oct. 1987.
2004. [53] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. En-
[27] , “Enhancement of log mel power spectra of speech using a phase- glewood Cliffs, NJ: Prentice-Hall, 1993.
sensitive model of the acoustic environment and sequential estimation [54] S. Quakenbush, T. Barnwell, and M. Clements, Objective Measures of
of the corrupting noise,” IEEE Trans. Speech Audio Process., vol. 12, Speech Quality. Englewood Cliffs, NJ: Prentice-Hall, 1988.
no. 2, pp. 133–143, Mar. 2004. [55] G. Chen, S. N. Koh, and I. Y. Soon, “Enhanced Itakura measure incorpo-
[28] Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech rating masking properties of human auditory system,” Signal Process.,
enhancement,” IEEE Trans. Speech Audio Process., vol. 3, no. 4, pp. vol. 83, pp. 1445–1456, Jul. 2003.
251–266, Jul. 1995. [56] J. Benesty, “Adaptive eigenvalue decomposition algorithm for passive
[29] M. Dendrinos, S. Bakamidis, and G. Garayannis, “Speech enhancement acoustic source localization,” J. Acoust. Soc. Amer., vol. 107, pp.
from noise: A regenerative approach,” Speech Commun., vol. 10, pp. 384–391, Jan. 2000.
45–57, Feb. 1991.
[30] P. S. K. Hansen, “Signal Subspace Methods for Speech Enhancement,”
Ph.D., Tech. Univ. Denmark, Lyngby, 1997.
[31] S. H. Jensen, P. C. Hansen, S. D. Hansen, and J. A. Sørensen, “Reduction
of broad-band noise in speech by truncated qsvd,” IEEE Trans. Speech Jingdong Chen (M’99) received the B.S. degree in
Audio Process., vol. 3, no. 6, pp. 439–448, Nov. 1995. electrical engineering and the M.S. degree in array
[32] H. Lev-Ari and Y. Ephraim, “Extension of the signal subspace speech signal processing from the Northwestern Polytechnic
enhancement approach to colored noise,” IEEE Signal Process. Lett., University in 1993 and 1995, respectively, and the
vol. 10, no. 4, pp. 104–106, Apr. 2003. Ph.D. degree in pattern recognition and intelligence
[33] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech control from the Chinese Academy of Sciences in
enhancement,” IEEE Trans. Speech Audio Process., vol. 9, no. 2, pp. 1998. His Ph.D. research focused on speech recogni-
87–95, Feb. 2001. tion in noisy environments. He studied and proposed
[34] U. Mittal and N. Phamdo, “Signal/noise KLT based approach for en- several techniques covering speech enhancement and
hancing speech degraded by colored noise,” IEEE Trans. Speech Audio HMM adaptation by signal transformation.
Process., vol. 8, no. 2, pp. 159–167, Mar. 2000. From 1998 to 1999, he was with ATR Interpreting
[35] Y. Hu and P. C. Loizou, “A generalized subspace approach for enhancing Telecommunications Research Laboratories, Kyoto, Japan, where he conducted
research on speech synthesis, speech analysis as well as objective measure-
speech corrupted by colored noise,” IEEE Trans. Speech Audio Process.,
ments for evaluating speech synthesis. He then joined the Griffith University,
vol. 11, no. 4, pp. 334–341, Jul. 2003.
Brisbane, Australia, as a Research Fellow, where he engaged in research in
[36] K. K. Paliwal and A. Basu, “A speech enhancement method based on
robust speech recognition, signal processing, and discriminative feature repre-
Kalman filtering,” in Proc. IEEE ICASSP, 1987, pp. 177–180. sentation. From 2000 to 2001, he was with ATR Spoken Language Translation
[37] J. D. Gibson, B. Koo, and S. D. Gray, “Filtering of colored noise for Research Laboratories, Kyoto, where he conducted research in robust speech
speech enhancement and coding,” IEEE Trans. Signal Process., vol. 39, recognition and speech enhancement. He joined Bell Laboratories as a Member
no. 8, pp. 1732–1742, Aug. 1991. of Technical Staff in July 2001. His current research interests include adaptive
[38] S. Gannot, D. Burshtein, and E. Weinstein, “Iterative and sequential signal processing, speech enhancement, adaptive noise/echo cancellation, mi-
Kalman filter-based speech enhancement algorithms,” IEEE Trans. crophone array signal processing, signal separation, and source localization. He
Speech Audio Process., vol. 6, no. 4, pp. 373–385, Jul. 1998. is a co-editor/co-author of the book Speech Enhancement (Berlin, Germany:
[39] Y. Ephraim, D. Malah, and B.-H. Juang, “On the application of Springer-Verlag, 2005).
hidden Markov models for enhancing noisy speech,” IEEE Trans. Dr. Chen is the recipient of 1998–1999 research grant from the Japan Key
Acoust., Speech, Signal Process., vol. 37, no. 12, pp. 1846–1856, Technology Center, and the 1996–1998 President’s Award from the Chinese
Dec. 1989. Academy of Sciences.
[40] Y. Ephraim, “A Bayesian estimation approach for speech enhancement
using hidden Markov models,” IEEE Trans. Signal Process., vol. 40, no.
4, pp. 725–735, Apr. 1992.
[41] I. Cohen, “Modeling speech signals in the time-frequency domain using
GARCH,” Signal Process., vol. 84, pp. 2453–2459, Dec. 2004. Jacob Benesty (SM’04) was born in Marrakech, Mo-
[42] T. Lotter, “Single and Multichannel Speech Enhancement for Hearing rocco, in 1963. He received the Masters degree in
Aids,” Ph.D. dissertation, RWTH Aachen Univ., Aachen, Germany, microwaves from Pierre & Marie Curie University,
2004. France, in 1987, and the Ph.D. degree in control and
[43] J. Vermaak, C. Andrieu, A. Doucet, and S. J. Godsill, “Particle methods signal processing from Orsay University, France, in
for Bayesian modeling and enhancement of speech signals,” IEEE Trans. April 1991.
Speech Audio Process., vol. 10, no. 2, pp. 173–185, Mar. 2002. During his Ph.D. program (from November 1989
[44] H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan, “HMM-based to April 1991), he worked on adaptive filters and
strategies for enhancement of speech signals embedded in nonstationary fast algorithms at the Centre National d’Etudes des
noise,” IEEE Trans. Speech Audio Process., vol. 6, no. 5, pp. 445–455, Telecommunications (CNET), Paris, France. From
Sep. 1998. January 1994 to July 1995, he worked at Telecom
[45] D. Burshtein and S. Gannot, “Speech enhancement using a mixture- Paris on multichannel adaptive filters and acoustic echo cancellation. From
maximum model,” IEEE Trans. Speech Audio Process., vol. 10, no. 6, October 1995 to May 2003, he was first a Consultant and then a Member of
the Technical Staff at Bell Laboratories, Murray Hill, NJ. In May 2003, he
pp. 341–351, Sep. 2002.
joined the Université du Québec, INRS-EMT, in Montréal, QC, Canada, as an
[46] J. Vermaak and M. Niranjan, “Markov Chain Monte Carlo methods for
associate professor. His research interests are in acoustic signal processing and
speech enhancement,” in Proc. IEEE ICASSP, vol. 2, May 1998, pp.
multimedia communications. He co-authored the book Advances in Network
1013–1016. and Acoustic Echo Cancellation (Berlin, Germany: Springer-Verlag, 2001). He
[47] S. Haykin, Adaptive Filter Theory, 4th Ed. ed. Upper Saddle River, is also a co-editor/co-author of the books Speech Enhancement (Berlin, Ger-
NJ: Prentice-Hall, 2002. many: Springer-Verlag, 2005), Audio Signal Processing for Next-Generation
[48] P. M. Clarkson, Optimal and Adaptive Signal Processing. Boca Raton, Multimedia Communication Systems (Boston, MA: Kluwer, 2004), Adaptive
FL: CRC, 1993. Signal Processing: Applications to Real-World Problems (Berlin, Germany:
[49] K. Fukunaga, Introduction to Statistial Pattern Recognition. San Springer-Verlag, 2003), and Acoustic Signal Processing for Telecommunication
Diego, CA: Academic, 1990. (Boston, MA: Kluwer, 2000).
[50] J. Capon, “High resolution frequency-wavenumber spectrum analysis,” Dr. Benesty received the 2001 Best Paper Award from the IEEE Signal Pro-
Proc. IEEE, vol. 57, no. 8, pp. 1408–1418, Aug. 1969. cessing Society. He is a member of the editorial board of the EURASIP Journal
[51] O. L. Frost, “An algorithm for linearly constrained adaptive array pro- on Applied Signal Processing. He was the co-chair of the 1999 International
cessing,” Proc. IEEE, vol. 60, no. 8, pp. 926–935, Aug. 1972. Workshop on Acoustic Echo and Noise Control.
1234 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

Yiteng (Arden) Huang (S’97–M’01) received Simon Doclo (S’95–M’03) was born in Wilrijk, Bel-
the B.S. degree from the Tsinghua University in gium, in 1974. He received the M.Sc. degree in elec-
1994, the M.S. and Ph.D. degrees from the Georgia trical engineering and the Ph.D. degree in applied sci-
Institute of Technology (Georgia Tech), Atlanta, in ences from the Katholieke Universiteit Leuven, Bel-
1998 and 2001, respectively, all in electrical and gium, in 1997 and 2003, respectively.
computer engineering. Currently, he is a Postdoctoral Fellow of the Fund
During his doctoral studies from 1998 to 2001, for Scientific Research—Flanders, affiliated with the
he was a Research Assistant with the Center of Electrical Engineering Department of the Katholieke
Signal and Image Processing, Georgia Tech, and was Universiteit Leuven. In 2005, he was a Visiting Post-
a teaching assistant with the School of Electrical doctoral Fellow at the Adaptive Systems Laboratory,
and Computer Engineering, Georgia Tech. In the McMaster University, Hamilton, ON, Canada. His re-
summers from 1998 to 2000, he worked with Bell Laboratories, Murray search interests are in microphone array processing for acoustic noise reduction,
Hill, NJ and engaged in research on passive acoustic source localization dereverberation and sound localization, adaptive filtering, speech enhancement,
with microphone arrays. Upon graduation, he joined Bell Laboratories as a and hearing aid technology. He serves as Guest Editor for the Journal on Ap-
Member of Technical Staff in March 2001. His current research interests are in plied Signal Processing.
multichannel acoustic signal processing, multimedia and wireless communi- Dr. Doclo received the first prize “KVIV-Studentenprijzen” (with E. De
cations. He is a co-editor/co-author of the books Audio Signal Processing for Clippel) for the best M.Sc. engineering thesis in Flanders in 1997, a Best Stu-
Next-Generation Multimedia Communication Systems (Boston, MA: Kluwer, dent Paper Award at the International Workshop on Acoustic Echo and Noise
2004) and Adaptive Signal Processing: Applications to Real-World Problems Control in 2001, and the EURASIP Signal Processing Best Paper Award 2003
(Berlin, Germany: Springer-Verlag, 2003). (with M. Moonen). He was secretary of the IEEE Benelux Signal Processing
Dr. Huang was an Associate Editor of the IEEE SIGNAL PROCESSING Chapter (1998–2002).
LETTERS. He received the 2002 Young Author Best Paper Award from the IEEE
Signal Processing Society, the 2000–2001 Outstanding Graduate Teaching
Assistant Award from the School Electrical and Computer Engineering,
Georgia Tech, the 2000 Outstanding Research Award from the Center of Signal
and Image Processing, Georgia Tech, and the 1997–1998 Colonel Oscar P.
Cleaver Outstanding Graduate Student Award from the School of Electrical
and Computer Engineering, Georgia Tech.

You might also like