FULLTEXT01
FULLTEXT01
FULLTEXT01
Electrical Engineering
October 2012
An Acoustic Echo
Cancellation System based on
Adaptive Algorithms
Contact Information:
Authors:
Veeratej Garre
E-mail: [email protected]
[email protected]
Supervisor 1:
Dr. Nedelko Grbic
School of Engineering (ING)
E-mail: [email protected]
Phone no: +46 455 38 57 27
Supervisor 2:
Mr. Magnus Berggren
School of Engineering (ING)
Email: [email protected]
Phone no.: +46 455 38 57 40
Examiner:
Dr. Sven Johansson
School of Engineering(ING)
Email: sven.johansson @ bth.se
Phone no.: +46 455 38 57 10
ii
Abstract
Adaptive filtering technique is one of the core technologies in digital signal processing and
finds numerous application areas in science as well as in industry. Adaptive filtering
technique is widely used in many applications, including echo cancellation, adaptive noise
cancellation, adaptive beam forming and adaptive equalization.
Acoustic echo is a common occurrence in today’s telecommunication systems. The
distraction caused by the acoustic echo, reduces the speech quality in the communication. In
the communication system acoustic echo cancellers is used works as the far-end signal is
delivered to the system, it will be reproduced by the loudspeaker in the room. A microphone
in the room picks up the resulting direct path sound and consequent reverberant sound as a
near-end signal, The far-end signal is filtered and delayed to resemble the near-end signal,
filtered far-end signal is subtracted from the near-end signal. The resultant signal represents
sounds present in the room excluding any direct or reverberated sound produced by the
loudspeaker. The AEC with adaptive filtering technique will more accurately enhance the
speech quality in hands-free and teleconferencing communication systems. The focus is on
speech enhancement of speech signal with reverberated signal in handsfree speech
communication using AEC with adaptive filtering technique. There are many adaptive
algorithms available in the literature for echo cancellation and every algorithm have its own
properties, but the aim of algorithms using for echo cancellation is to achieve higher
ERLE(amount of echo cancelled) in dB at a higher rate of convergence with low complexity.
The adaptive algorithms NLMS, APA and RLS for echo cancellation were
successfully implemented in MATLAB. The three algorithms for AEC are tested with
simulation in three different echo occurring environments by changing microphone position,
source position and room dimensions. The performance evaluation of the NLMS, APA and
RLS algorithms are measured with ERLE parameter. The results show that the RLS
algorithm have good performance with high rate of convergence speed but the computational
complexity is high which makes it impractical in real time applications. The amount of echo
cancellation with APA algorithm is higher than NLMS with less computational complexity
than RLS and easy to implement in real time. The amount of echo cancellation with NLMS is
iii
low when compared to RLS and APA but it is easy to implement in real time with less
computational complexity. The detailed view of the comparison results of three algorithms at
three different environments are shown in section 6.
iv
Acknowledgement
We would like to express my sincere gratitude and thanks to my thesis supervisor Dr. Nedelko
Grbic, Mr. Magnus Berggren for providing us a chance to do my thesis research work under
their supervision and Dr. Sven Johansson as a examiner in the field of Speech Processing. We
would like to thank them for the persistent help throughout the thesis work. With their deep
knowledge in this field which helped us to learn new things in order to complete master thesis
successfully. The continuous feedback and encouragement helped us in doing this thesis work.
We extend my appreciation and thanks to my fellow students A.B.N Suresh kumar and
Harish Midathala for their suggestions and discussions regarding solving different problems in
doing this research thesis.
We would like to thank BTH for providing us a good educational environment where we
can gain the knowledge and learn about new technologies that help us to move forward with the
thesis work.
Finally, we would like to extend my immense gratitude and wholehearted thanks to my
parents for their moral support and financial support throughout my educational career. They
have motivated and helped us for the successful completion of thesis work. We also thank my
pals for their support and encouragement during the thesis work. We take an opportunity to
thank all the signal processing staff at BTH.
We would lastly thank to all those for their support and help in any aspect for the
successful completion of the thesis work.
v
List of figures
vi
Figure 26: ERLE of APA at environment2……………………………………………...….34
Figure 27: Desired signal of APA at environment3…………………………………...……34
Figure 28: Estimation error signal ‘e’ of APA at environment3……………………………35
Figure 29: ERLE of APA at environment3……………………………………………...….35
Figure 30: Desired signal of NLMS at environment1………………………………………36
Figure 31: Estimation error signal ‘e’ of NLMS at environment1………………………….37
Figure 32: ERLE of NLMS at environment1………………………………………...……..37
Figure 33: Desired signal of NLMS at environment2………………………………………38
Figure 34: Estimation error signal ‘e’ of NLMS at environment2……………………….…38
Figure 35: ERLE of NLMS at environment2…………………………………………...…..39
Figure 36: Desired signal of NLMS at environmen3……………………………………….39
Figure 37: Estimation error signal ‘e’ of NLMS at environment3………………………....40
Figure 38: ERLE of NLMS at environment3…………………………………………..…..40
Figure 39: Desired signal of RLS at environment1…………………………………….......41
Figure 40: Estimation error signal ‘e’ of RLS at environment1………………………........42
Figure 41: ERLE of RLS at environment1………………………………………………….42
Figure 42: Desired signal of RLS at environment2…………………………………………43
Figure 43: Estimation error signal ‘e’ of RLS at environment2………………………….....43
Figure 44: ERLE of RLS at environment2………………………………………………….44
Figure 45: Desired signal of RLS at environment3…………………………………………44
Figure 46: Estimation error signal ‘e’ of RLS at environment2…………………………….45
Figure 47: ERLE of RLS at environment3………………………………………………….45
Figure 48: ERLE comparison of NLMS, APA and RLS at environment 1 in graph...….....47
Figure 49: ERLE comparison of NLMS, APA and RLS at environment 1 in chart...……..47
Figure 50: ERLE comparison of NLMS, APA and RLS at environment 2 in graph...….…48
Figure 51: ERLE comparison of NLMS, APA and RLS at environment 2 in chart……….48
Figure 52: ERLE comparison of NLMS, APA and RLS at environment 3 in graph……....49
Figure 53: ERLE comparison of NLMS, APA and RLS at environment 3 in chart…….…49
vii
List of tables
viii
List of abbreviations
NLMS Normalized Least- Mean Square
ASR Automatic Speech Recognition
SNR Signal-to-Noise Ratio
LMS Least Mean Square
RLS Recursive Least Square
APA Affine Projection Algorithm
FIR Finite Impulse Response
IIR Infinite Impulse Response
FD Fractional Delay
RIR Room Impulse Response
ISM Image Source Model
ERLE Echo Return Loss Enhancement
RTF Room Transfer Function
ISM Image Source Model
GSC Generalized Side-lobe Canceller
LCMV Linearly Constrained Minimum Variance
SD Speech Distortion
AEC Acoustic Echo Cancellation
ix
Contents
Abstract.............................................................................................................iii
Acknowledgement..............................................................................................v
List of figures.....................................................................................................vi
List of tables.....................................................................................................viii
List of abbreviation...........................................................................................ix
1 Introduction…………………………….………………………………….1
1.1 Hands-free speech enhancement……………………………………………………….3
1.1.1 Applications………………………………………………..………………………….....3
1.2 Hands-free speech communication problem………………..…………………………..5
1.2.1 Background noise……………………………………………..………………….………6
1.2.2 Reverberation…………………………………………………..………………..….......6
1.2.3 Acoustic coupling……………………………………………..…………..……………...7
1.3 Fractional delay…………………………………………………………………………….….8
2 Room reverberation…………………………………………..…….……..9
2.1 Introduction……………………………………………………………………………………9
2.2 Room image model…………………………………………………………………………...11
3 Adaptive filtering………..………………………………………………..14
3.1 Introduction……………………………………………………………………………………14
3.2 Adaptive filtering……………………………………………………………………………...14
3.3 Applications of adaptive filters………………………………………………………………..15
4 Acoustic echo cancellation…………………………………………….….20
4.1 Introduction…………………………………………………………………………………....20
4.2 Adaptive filter algorithm for echo cancellation……………………………………………….22
4.3 NLMS algorithm…………………………………………………………………………........23
4.4 RLS algorithm………………………………………………………………………………....24
4.5 APA algorithm…………………………………………………………………………………25
4.6 Echo return loss enhancement………………..……………………………………………......26
x
5 Evaluation setup……...………………………………….………………..27
5.1 Introduction……………………………………………………………………………….…....27
5.2 Evaluation setup for echo cancellation with adaptive algorithm………………………………27
6 Results…..………………………………………………………………….31
6.1 Simulation results for echo cancellation using APA algorithm…………………………….....31
6.1.1 At environment 1: room [3,4,2.5], microphone [1,2,1], source [1,1,1]………...……....31
6.1.2 At environment 2: room [2,3,2.5], microphone [1,1,1], source [1,2,1]…………...…....33
6.1.3 At environment 3: room [4,2,2], microphone [2,1,2], source [2,2,2]...………………...34
6.2 Echo cancellation using the NLMS algorithm…………………………………………….…...36
6.2.1 At environment 1: room [3,4,2.5], microphone [1,2,1], source [1,1,1]…………….…..36
6.2.2 At environment 2: room [2,3,2.5], microphone [1,1,1], source [1,2,1]………………...38
6.2.3 At environment 3: room [4,2,2], microphone [2,1,2], source [2,2,2]…….………….....39
6.3 Echo cancellation using the RLS algorithm…………………………………………………....41
6.3.1 At environment 1: room [3,4,2.5], microphone [1,2,1], source [1,1,1]…………….…..41
6.3.2 At environment 2: room [2,3,2.5], microphone [1,1,1], source [1,2,1]………………...43
6.3.3 At environment 3: room [4,2,2], microphone [2,1,2], source [2,2,2]…….………….....44
6.4 Comparing ERLE of APA, NLMS and RLS in three environments………………..………....46
6.4.1 At environment 1: room [3,4,2.5], microphone [1,2,1], source [1,1,1]…………….…..46
6.4.2 At environment 2: room [2,3,2.5], microphone [1,1,1], source [1,2,1]………………...48
6.4.3 At environment 3: room [4,2,2], microphone [2,1,2], source [2,2,2]…..……………....49
8 Bibliography…………………………………………………………….....52
xi
1. Introduction
Hands-free communication is the area which has undergone tremendous advancement
in the recent past. It covers many things such as mobile telephony, hearing aids and
automatic information systems i.e. voice controlled systems, video conferencing systems
and many of the multimedia applications. More and more people are using personal
communication devices, personal computers and wireless mobile telephones which in turn
transforming into advanced personal communication systems. The advancements in
interpersonal communication systems are realized by continuous effort for improving and
extending the interaction between individuals, which are not only provides user safety and
quality but it is user friendly too. The combination of telephone technologies and computers
are making way for convenient hands-free communication.
The advancement in wireless communication technology has provided ease of usage
for voice connectivity in cellular communication and personal computer devices in order to
enabling the natural communication in different environments such as cars, restaurants and
offices. In hand-controlled automobile applications, the functionalities are processed with
voice controls; the signal degradations in this field are same as that of distant-talker speech
recognition applications. Audio conference plays a key role in communication systems for
small scale and a large scale firm which is cost effective and also aimed for user comforts. In
present generations, the demand for voice controlled systems is high as the hand-controlled
functions are replaced with voice controls which are efficient and also robust. The
importance of speech processing techniques have been analyzed for capability of preventing
damage to hearing in high-noise environments and also improving speech intelligibility in
noise for hearing impaired listeners.
Hands-free speech acquirement plays a vital role in all above mentioned applications.
In automated speech system design the microphone is placed far away from the user (
speech transmitter and receiver are installed at remote places with certain distance in between
them ) due to which problems like poor sound quality and acoustic echo arise from far-end
side. The poor sound quality is because of the microphone placed near to the speaker due
to which it suffers from unwanted disturbances caused by environmental noise, interfering
sounds and reverberation of speech signal from loudspeaker corrupts the actual speech
signal. In full-duplex hands-free communication acoustic echo is generated at the near end
1
side at microphone causes disturbance to the speaker at the far end side in which listener hear
his own voice with 100-200 ms delay. This leads to reduce intelligibility of the received
speech in a noisy conditions and also degrading the speech in speech recognition systems.
The degradation in the received speech signals makes conversation between the users
difficult. For improvement in the quality of the hands-free mobile telephones, the major tasks
to be considered are background noise suppression, interference reduction and acoustic echo
cancellation. For the improvement of the speech quality and reducing unwanted disturbances
several speech enhancement methods are implemented for robust speech communication
system. Microphone arrays are widely used technology for speech enhancement in
communication systems were speech quality and speech intelligibility is being degraded due
to a noisy environment and room reverberations.
The perception of speech signal is measured in terms of quality and intelligibility.
The “Quality” is a subjective measure which reflects on the individual preferences of
listeners [1]. The “Intelligibility” is an objective measure which predicts the percentage of
words that can be correctly identified by listeners [1]. Speech enhancement is required when
the speech signal and received signals are degraded. The purpose of speech enhancement is
to improve noisy speech signals.
The received speech signals in automated speech are mainly corrupted by background
noise. In general, the background noise can be non-stationary and the signal to noise ratio
(SNR) decreases if the noise level increases. Since a few decades the research in speech
enhancement methods of acoustically distributed signals has been performed widely and the
contribution of digital hearing aids has significantly improved the research in hands-free
communication systems.
The acoustic echo cancellation plays a k e y role in acoustically coupled
environments. The acoustic echo plays a major role in degrading the speech intelligibility
in speech communication systems like hearing aids and telecommunication systems. In this
thesis, adaptive methods like APA, NLMS and RLS algorithms are used to cancel the
acoustic echo.
2
1.1 Hands-free speech enhancement
Speech enhancement is necessary in hands-free communication devices such as
cellular phones, teleconferences and automatic information systems. For example speech
signals produced in a room generate reverberations, which are noticed when a hand-free
single channel telephone system is used and binaural listening is not possible [2]. Necessity
for enhancement of normal speech is required for impaired listeners to fit into their
individual hearing capabilities.
Speech enhancement in hand-free mobile communication is possible by spectral
subtraction [2] or temporal filtering such as wiener filtering, noise cancellation and multi-
microphone methods using different array techniques [2]. Different array techniques are used
to handle room reverberations. Hands-free speech communication is generally characterized
by reduction in speech naturalness and intelligibility resulting from the corruption of the
speech sound field during data capture by microphones, as well as speech distortion
generated by data transmission and reproduction [3].
Hands-free speech enhancement is defined as the ability to improve the
discrimination between speech and background noise, reverberation and other types of
interferences colliding on microphones [3]. In hands-free communication systems
perceptual aspects such as quality and intelligibility are necessary for speech enhancement.
The quality and intelligibility are un-correlated and can be achieved simultaneously.
Improvement in intelligibility can be achieved by emphasizing the high frequency content
of the noisy speech signal. Therefore, for intelligibility improvement quality should be
neglected. In other words quality and intelligibility performance is said to be inversely
proportional in the noisy speech signal. Human hearing system has the capability of
discrimination of speech in noisy reverberant environments.
1.1.1 Applications
Based on frequency selectivity, focused hearing and spatial sound's location, many
speech enhancement systems try to substitute and analyze in accordance with the human
hearing mechanism. There are numerous applications of hands-free speech enhancement. A
few important applications are explained briefly below.
3
a) Hearing aids
Hearing aids is concerned with the remedies for the hearing problems that are caused
due to unwanted disturbances. Nearly 25 percent of the present human population is
suffering from hearing impairment by damaging the inner ear hair cells of humans in the
process of exposure to loud noise. The exposure to loud noise is mainly in the environments
of industries, cooling systems, automobiles, engines and by listening to loud music using
headsets. Human hearing system exposing to these types of environments may lead to
temporary or permanent hearing loss. The hearing aid system amplifies the received signal,
If the signal consists of noise, it is also amplified along with speech signal as hearing
impaired people are incapable of distinguishing the speech signals and noise. The main
problem for hearing aid is acoustic echo due to the small distance between microphone
and speaker. To overcome the above situations, microphone arrays for speech enhancement
and an acoustic echo cancellation are used.
In this thesis, hearing aids is considered as one of the application in order to make
the hearing impaired person more comfortable in hearing the received speech signal and
reducing the noise and echo caused due to various environments. During the communication,
the speech signal is reverberated in the room from reflection of the wall. Therefore speech
signal is corrupted by ambient noise in the environment to the far-end user.
b) Voice control and speech recognition systems
The advancement in the electrical technology made a huge demand for consumer
products, telephones and personal devices and these are rapidly adapting to allow voice
control. In order to provide convenience and easy use, a large number of systems i s
controlled by voice, a few of the applications are lights and heating systems, powering,
opening window and curtains and adjusting home entertainment systems [3].
The main aim of the voice control and speech recognition systems is to replace
hand-controlled functions with voice controls t o progress i n efficiency and optimized
speech automated m et hods . In the process of speech enhancement in ASR (Automatic
Speech Recognition) method it avoids degrading the quality of speech due to the ambient
noise and room reverberations. The ASR increases the quality of received speech signal and
is based on statistical pattern recognition. The degradation of the signal is calculated based
on the amount of similarity between clean speech recognizer and noise speech signal. In
4
order to get improved SNR of the received noisy speech signal and also to increase the
speech intelligibility microphone array technique can be used.
c) Audio-conferencing
The exploitation of the broadband internet connections gav e ri s e t o t h e
advancements i n telecommunication and video communication systems for personal
computers based internet protocols. The advancements in the wireless communication
technology developed to increase the speech intelligibility in desktop and mobile
environments. The wireless communications have been frequently used in airports, offices
companies and restaurants. In these types of environments, the ambient noise composites
human babble noise, fan noise as well as moving object such as chairs and colliding items
[3]. Normally a microphone is placed at the top of the monitor in concern with optimization
of speaker’s eye level. The speaker and the microphone unit are placed at an operating
distance of 45-60 cm. For better solution for this kind of systems spectral subtraction
algorithms and beam forming are used.
Audio conferencing plays an important role in many large and small companies for
meeting and online study courses as it is cost effective and also saves time computed to
travel. Nowadays, it has become a mandatory step for many firms and individuals for
conducting teleconferences with sophisticated and reliable technologies. The conference
rooms are characterized by ambient noise due to all the participants in the conference are
surrounded by speech acquisition systems. As speaker and microphone are placed at varying
distance room reverberations occurs in conference rooms. The distance between the user
and the microphone is large when compared with other applications. The best solution for
the above problem can be solved by using microphone arrays and echo cancellation which
have the capacity to detect the speech and reduce the echo. In video technology, there is
system which allows steering and aiming the camera at the speaker [3].
5
Figure 1: Typical hands-free speech communication environment
1.2.1 Background noise
Noise is present in any type of environment. Background noise is mostly due to
automobile traffic, engines, fan noise, background sound in public places, vibration noise
from heavy industries, and aircrafts. In hands-free speech communication, background
noises degrade the performance of speech recognition systems which is a severe problem for
hearing aid users and also suppress the intelligibility of the speech. Acoustic disturbances
arrive from different directions and are said to be background noise containing higher levels
of low frequency components when compared to speech signal therefore to extract speech
signal spectral based methods are used. In general, speech is characterized by a laplacian
distribution whereas background noise is characterized by Gaussian distribution and by
considering a certain class of distribution techniques can be developed for extracting speech
or background noise.
1.2.2 Reverberation
Speech signal in closed environments is reflected by the walls, objects and
ceilings in the room. As illustrated in Figure.1. These reflections cause disturbance to the
speech produced from the loudspeaker to microphone. The reverberation time is the time it
takes for a room impulse response to decay 60 dB from its largest peak. The energy of
6
confined reverberation depends on the location of acoustic sensors and the source in the
room and their distances.
The reverberation effect can be reduced by keeping the microphone close to the
source signal of interest. Reflections will affect the direct speech of the user while reaching
the receiver and blur its temporal and spatial characteristics. This type of communication is
not acceptable for hands-free communication like in telephone systems and communication
systems which adds unwanted disturbance and reverberation to the listener in real time. This
reduces the quality of the speech signal in reverberant conditions. In case of speech
recognition and verification applications in highly reverberant environments the performance
of the speech signal is reduced. The de-reverberation also adds an advantage to the
hearing impaired listeners as it increase speech intelligibility [4].
1.2.3 Acoustic coupling
In hands-free duplex communication, the reflected transmission path between loud
speaker and microphone is the echo path. In full duplex communication, the far-end signal
which is emitted by the speaker, propagates in the environment and is picked up by the
microphones in the same way as other interfering signals [3]. The acoustic echo occurred
during the full duplex hands free communication degrade the speech intelligibility, which
disturb the user like listening his own speech after some delay. In hands-free communication
system the SNR is reduced due to large distance between the microphone and the speaker as
it is disturbed by ambient noises.
7
can be cancelled using adaptive algorithms such as NLMS, RLS and APA algorithms.
1.3 Fractional delay
In digital filters, fractional delay filters used for band-limited interpolation. Band-
limited interpolation is a technique developed for evaluating the sample signal at an
arbitrary point of time even if the signal is placed between two sample points of the signal.
The arbitrary sampling of the signal is band limited to half the sampling rate (Fs/2) for the
sampling value to exact, which implies that the continuous-time signal can be exactly
regenerated from the s a m p l e d data. Now, the processing of the sample value is easy to
evaluate at any given arbitrary time even if the signal is fractionally delayed. The last integer
multiple of the sampled interval is used in the calculation of the fractional delay. The
fractional delay filters use FIR and IIR filters for the evaluation of fractional delays.
Fractional delay filters are used in various fields of applications in process of speech
coding and synthesis, sample rate conversion, beam steering, design of digital differentiators
and integrators. In the above mentioned fields there is a problem of the fixed sampling
period. Fractional- delay filters are the filters having flat phase delays with a wide frequency
band, with the value of phase delay approximating the fractional delay and are normally used
for the modeling of non-integer delays. Therefore, these filters are used in many real time
applications where actual sampling instants are necessary. Fractional delay is non-integer
multiple of the sampling interval, which is assumed to be uniform sample. These filters
provide the observation of signal values at arbitrary location in the sampling interval [5].
8
2. Room reverberation
2.1 Introduction
In speech communication systems like hands- free mobile telephones, hearing aid,
tele-conference systems and voice controlled systems the received microphones signals are
degraded with background noise, reverberation, and other interferences of the signal. The
performance of the automatic speech recognition systems decreases due to the degradation of
the signal.
In this study of reverberation the multi-path propagation of an acoustic sound from its
source point microphone is analyzed. The reverberant signal can be described as an audio
signal with a coloration and noticeable echo. The received microphone signals are
characterized as
1. Direct sound
2. Early reverberation and
3. Late reverberation as shown in Figure 3
Figure 3: Illustration of a direct sound, an early sound, an early reverberation and late
reverberation from source to the microphone.
The direct sound is said to be the first signal that is received by the microphone, the early
reverberation is said to be a signal that is arrived after the direct sound and the late
reverberation is said to be the signal that is arriving next after early reverberation, these
detrimental perceptual effects are primarily caused by late reverberation and generally
9
increase with increasing distance between the source and microphone. Conversely, early
reverberations tend to improve the intelligibility of speech. In combination with the direct
sound it is sometimes referred to as the early speech component [6].
To eliminate the far end echo signal an acoustic echo canceller are used. To reduce
the background noise and residual echo usage of post processor is applied to remove the echo
that are not eliminated by echo canceller. Hands-free systems are often used in a noisy and
reverberant environment and so the received microphone signal does not only contain the
Desired signal but also interferences such as room reverberation that are caused by the
desired source and a far-end echo signal that results from a sound that is produced by the
loudspeaker [6].
Figure 5: Illustration of a direct path and a single reflection of the desired source to the microphone.
10
The degraded signal received at the microphone are due to reverberation introduced by the
multi-path propagation of the desired speech signal to the microphone signal as shown in
Figure 5.
11
Figure 7: Reverberated environment with reflected source images
The red part is the origin. The x-coordinate of the virtual sources can be
expressed using the sequence below
( 2.1 )
xs is the x-coordinate of the sound source and xr is the length of the room in the
x-dimension. The location of the ith virtual source for value of i is determined. If i value is
negative then the virtual source is located on the negative x-axis. If i = 0 then the virtual
source is actually the real source. We can find the distance between the ith virtual sound
source and our microphone by subtracting the microphone's x-coordinate, xm, from xi. This is
shown below.
( 2.2 )
The relative positions of the virtual sources along the y and z axes can be found in a similar
fashion using equations 2.2 and 2.3.
( 2.3 )
( 2.4 )
12
( 2.5 )
( 2.6 )
Were ‘c’ is velocity of sound in meters. The ts value is estimated for multiple reflections of
reverberation. For every reflection there should be some loss of energy which is estimated by
using reflection co-efficient (α) alpha. Calculation of reflection co-efficient and its effect are
explained in [9].
Figure 8: Illustration of a direct sound (red color) and a reverberated sound (blue color) in a close
room environment.
The effect of reverberation for a signal is shown in Figure 8. The red colored signal in the
figure indicates the original speech signal and the blue colored signal in the figure indicated
amplified reverberant signal due to the addition of reflection energy at a particular unit
sample.
13
3. Adaptive filtering
3.1 Introduction
Signal processing is used in the area of electrical engineering, systems
engineering and applied mathematics. Signal processing is a tool for representation,
manipulation and transformation of signals and the data it contains. In the past
generation, the most extreme technology used for signal processing was analog signal
processing which involved both linear and nonlinear circuits. The rapid advancement
in the digital computer technology and integrated circuit fabrication resulted in an
area of science and engineering called digital signal processing. It is because of the
programming capability, low cost, miniature size, and low power consumption that
widespread application of DSP techniques is being carried out [10]. In digital signal
processing one of the widely used specialized branch is adaptive signal processing
which mainly concerned with adaptive filters and their applications.
14
The one basic common feature of adaptive filters is:
An input vector and a desired response are used to compute and estimation error,
which in turn is used to control the values of a set of adjustable filter coefficients by a
feedback loop and an algorithm [11].
15
the same bandwidth as that of speech [18]. The design of adaptive noise canceller
for speech signals consists of two inputs. The desired input consists of voice that is
corrupted by noise (speech signal) and other reference input that contains noise
which is related in some way to the desired input noise. The noise reference input is
made as similar as that of the desired input noise by passing it to the system filter and
that filtered version is subtracted from the desired input. Therefore by removing the
noise from the desired input signal the noise free signal is obtained. The setup is show
in Figure 10. From practical system noise is not completely removed but its level is
reduced considerably.
16
filter output or the estimation (prediction) error may serve as the system output. In the
first case, system operates as a predictor, in the latter case; it operates as a prediction
error filter. The setup is shown in Figure 11.
d) Interference cancellation
In this application, adaptive filter is used to cancel unknown interference
contained alongside an information signal component in a primary signal, with the
cancellation being optimized in some sense in fig 1.4. The primary signal serves as
the desired response for the adaptive filter. A reference (auxiliary) signal is employed
as the input to the adaptive filter. The reference signal is derived from the sensor or set
of sensors located in relation to the sensors supplying the primary signal in such a
way that the information signal component is weak or essentially undetectable [18].
17
data rates [12]. These effects are due to the out-of-boundary transmission medium and
the multipath effects in the radio channel. A typical communication system is depicted
in Figure 13,
Additive
Noise
e(n)
Adaptive weights ∑
Supervise
Training
Unsupervised training
Figure 14: Adaptive equalizer
The equalizer is designed to be adaptive to the channel variation in the transmission of
high speed data over a band limited channel. The equalizer is recursively updated by an
adaptive algorithm based on the observed channel output for reconstructing the output
signal. The configuration of an adaptive equalizer is depicted in Figure 14.
18
f ) Acoustic echo cancellation
An acoustic echo canceller can overcome the acoustic echo that interferes with
teleconferencing and hands free telecommunication. It adaptively identifies the transfer
function between a loudspeaker and a microphone, and then produces an echo replica
that is subtracted from the real echo [13]. Echo occurs when an audio source and
sink operate in full duplex mode. In this situation the received signal is output
through the telephone loudspeaker (audio source), this audio signal is then
reverberated through the physical environment and picked up by the systems
microphone (audio sink). The result is that time delayed and attenuated images of the
original speech are returned to the distant user [18].
The present study deals with canceling these echo signals for improving the
communication quality by using various adaptive filtering algorithms and comparing
the performance of all these algorithms when applied to echo cancellation application.
Echo cancellation is critical to achieving high quality voice transmissions over packet
networks, which typically face transmission delays above 30 to 40ms. These long
delays make echo readily apparent to listeners, and must be eliminated in order to
provide viable telephony service [14].
19
4. Acoustic echo cancellation
4.1 Introduction
In hands-free speech communication the main aim of the system is to provide
good voice quality and good intelligibility of the speech when two or more people
communicate with each other from different locations. During the communication
between two or more people due to the acoustic echo conversation between talkers and
listeners the voice quality becomes degraded and there is a chance of loss in intelligibility
of the signal.
Figure 15: Hands-free communication system with echo paths in a conference room
The phenomenon in which the delayed and distorted version of the original speech
signal or the electrical signal is reflected back to the speech source is known as Echo.
Acoustic echo is defined as a type of noise which occurs due to the reflections of speech
signal by the walls, ceiling or objects of a room and also defined as an acoustic coupling
between the loudspeaker and the microphone. The main aim of the hands-free
communication is to cancel the acoustic echo in order to provide echo free environment
20
for loudspeakers during the communication. In this thesis the main concentration is to
simulate the acoustic echo cancellation using APA.
Figure 15 shows the scenario of a hands free communication system with echo paths
in the conference room where the speech from the far-end processed from a loud-speaker
reaches the microphone of near- end of the room in various paths i.e. direct path and
reflected path from the wall, ceilings and objects in a room forming an echo that is sent
back to the far-end. Therefore, this causing disturbance in the speech quality of the signal
in communication process which leads to a major problem in communication systems.
In order to overcome the acoustic echo problem in hands free
communication systems such as hearing aids, teleconferencing several methods have been
designed using directional microphones. In order to reduce echo in hands-free
communication AEC has been implemented. The AEC helps in eliminating echo and to
enhance the quality of speech in communication systems. The design of AEC provides the
clarity, smooth and comfortable way of communication for the participants in the
conference room. The echo cancellation is achieved using several adaptive algorithms
such as LMS, NLMS, RLS and APA. The mentioned algorithms follow the same
procedure to cancel echo in any of the communication applications. In our thesis, the
main concentration is on APA, NLMS and RLS adaptive filter algorithms in order to
achieve echo cancellation. Figure 16 shows structure of how to implement AEC using
adaptive filters in three basic steps.
W(n)
Figure 16: Implementation of acoustic echo cancellation using the adaptive Filter
The three basic steps using adaptive algorithms for are mentioned in detail as [16]
1. Estimate the characteristics of echo path of a room
21
2. Create a replica of the echo signal
3. Subtract echo from the microphone signal in order to obtain clean speech signal.
Therefore, AEC plays a major role in communication systems by avoiding the
acoustic coupling between microphone and loudspeaker. If the echo is generated then
coupling causes the undesired characteristics of acoustic echo that degrades that quality of
sound and intelligibility of the speech.
w(n) h(n)
22
fed back into the adaptive filter and its coefficients are changed algorithmically in order to
minimize the cost function. In the case of acoustic echo cancellation, the optimal output of
the adaptive filter is equal in value to the unwanted echoed signal. When the adaptive filter
output is equal to desired signal the error signal goes to zero. In this situation the echoed
signal would be completely cancelled and the user would not hear any of their original
speech returned to them.
where ║║² = Euclidean Norm and β is the normalized step size with 0 < β < 2.
Replacing µ in the LMS weight vector update equation with µ(n) leads to NLMS
algorithm, which is given by
( 4.3 )
║ ║
( 4.4 )
where d(n) is a desired signal
23
Advantages and disadvantages:
NLMS algorithm has a good convergence speed which makes this algorithm useful for echo
cancellation. It shows greater stability with unknown input signals. The noise amplification
becomes smaller when using normalized step size. It has minimum steady state error and
faster convergence. Compared with LMS algorithm, the NLMS algorithm requires additional
computations to evaluate the normalization term ║x(n)║². NLMS algorithm requires 3N+1
multiplication which are N times more than the LMS algorithm.
= ( 4.5 )
( 4.8 )
24
( 4.11 )
X(n)=
25
( 4.15 )
The objective of the affine projection algorithm is to minimize
║ w(n)-w(n-1)║² ( 4.16 )
Subject to:
d(n) – XT(n) w(n) = 0 ( 4.17 )
T -1
w(n) = w(n-1) +µ X(n) ( X (n) X(n) + γI ) e(n) ( 4.18 )
choosing µ in the range of 0 < µ ≤ 2
The affine projection algorithm maintains the next coefficient vector w(n) as close as
possible to the current w(n-1), while forcing the a posteriori error to be zero [28].
Using techniques similar to those which led to FRLS from RLS a fast version of
APA, FAP may be derived. APA includes LMS like complexity affine projection algorithm
is that it causes no delay in the input or output signals. These features make APA an excellent
candidate for an adaptive filter in the acoustic echo cancellation problem. To improve the
power of a speech signal NLMS is modified to APA the gradient of the signal is multiplied
with the original pure input signal which improves the power of the output and faster
convergence.
Advantages and disadvantage:
APA has faster tracking capabilities than NLMS. APA has a better performance in steady
state MSE or transient response compared with other algorithms. APA has a better
performance and complexity compared with NLMS and APA.
( 4.19 )
Where ‘ ’ is the input desired signal power and ‘ ’ is the power of a residual error signal
after echo cancellation.
26
5. Evaluation setup
5.1 Introduction
This thesis deals with the elimination of disturbances due to the echo which occurs
during the hands-free speech communication. These disturbances caused during the speech
communications were explained in the previous chapters. Echo cancellation using adaptive
algorithms APA, NLMS and RLS are implemented in MATLAB. The implementation of this
system will be explained clearly in this chapter. My aim is to implement and perform an
evaluation of adaptive echo canceller using APA, NLMS and RLS algorithm.
This chapter deals with the implementation and analysis of the adaptive echo
canceller as it is one of the best speech enhancement system for hands-free speech
communication systems which was discussed in detail in the previous chapter. The
implementation and experimented setup of the system to be examined is discussed in detail in
the next section. Considered various parameters of the particular system to achieve optimum
values are mentioned clearly in the next section. Finally the results of adaptive echo canceller
and evaluation of performance in different environments are plotted in the results section.
The performance of the acoustic echo canceller depends on parameters like spectrum,
background noise level, pitch variability, gender, language and age. The strong pitch voice
can be easily converging than the soft pitch voices. The intensity of sound is defined as
sound power per unit area and the perception of loudness is related to both the sound pressure
level and duration of a sound
The implementation of NLMS, APA and RLS algorithms suppress the echo and noise
in the acoustic echo cancellation system. For testing (Speech_all.wav) signal contains four
sentences with female and male voice alternatively is taken. The sampling frequency of the
speech signal is 16000Hz, duration of 11 seconds. These four sentences are described in
Table 1. The input of the algorithm is clean speech signal of far end user x(n) and desired
signal is taken as reverberated signal received at near end microphone. The reverberated
signal at three closed room environments is generated at different room dimensions,
27
microphone position and source position implemented using RIR as described in section 2,
with the reflection coefficient α=-0.8 in MATLAB. The three environments are
Environment 1: room [3,4,2.5], microphone [1,2,1], source [1,1,1]
Environment 2: room [2,3,2.5], microphone [1,1,1], source [1,2,1]
Environment 3: room [4,2,2], microphone [2,1,2], source [2,2,2]
The room impulse response of the three environments are shown below
28
Figure.20 Room impulse response of environment 3
Table 1:
File Duration Type of Sentences
name in sec voice
3 Female “It’s easy to tell the depth of the well.”
2 Male “Kick the ball straight and follow through.”
Speech_all.wav 3 Female “Glue the sheet to the dark blue background.”
3 Male “A part of tea helps to pass the evening.”
Table.1: The details of clean speech signal used for evaluation
The filter order is taken as 500, 1000, 1500, 2000 and 2500 for AEC with NLMS,
APA and RLS. The algorithms are tested with different parameter values (trial and error
method) within a limit to fix the value which gives high amount of echo cancellation. The
NLMS implementation is mentioned in section 3.4, the step size β=1 is taken and
reverberated signal is taken as input. The RLS implementation is mentioned in section 3.5,
the exponential weighting factor λ=1 and value used to initialize P(0) is δ=0.1. The APA
implementation is mentioned in section 3.6, the step size µ=1 is taken and projection order is
taken as 20 because reasonable adjustment of the projection order is worth considering
29
satisfying fast convergence rate and small steady state estimation error. The parameters for
three algorithms are tested with different values and selected the best value with which
amount of echo cancelled (ERLE) is high. The microphone signal contains reverberated
speech signal of far-end user, noise signal is not added in this experiment. An acoustic echo
cancellation system using adaptive algorithm is explained in chapter 4. The estimation error
is plotted with filter order 2500. The ERLE with respect to the order of the filter (number of
coefficients) is plotted for every system at three environments and the performance of three
systems is compared in each environment. The ERLE is the ratio of input desired signal
power and the power of a residual error signal immediately after e c h o cancellation. The
calculated ERLE value represents the measurement of echo loss processed by the adaptive
filter.
30
6. Results
The desired signal received by microphone at environment 1, is shown in Figure 21, The
error signal estimated by the adaptive filtering with APA algorithm of order 2500, is shown
in the Figure 22 and the amount of echo cancellation after the adaptive filtering with APA is
plotted with respect to the order of the filter, is shown in the Figure 23.
31
Figure.22: Estimation error signal ‘e’ of APA at environment1
32
6.1.2 At environment 2: room [2,3,2.5], microphone [1,1,1], source
[1,2,1]
The desired signal received by microphone at environment 2, is shown in Figure 24, The
error signal estimated by the adaptive filtering with APA algorithm of order 2500, is shown
in the Figure 25 and the amount of echo cancellation after the adaptive filtering with APA is
plotted with respect to the order of the filter, is shown in the Figure 26.
33
Figure.26: ERLE of APA at environment2
34
Figure.28: Estimation error signal ‘e’ of APA at environment3
35
6.2 Echo cancellation using the NLMS algorithm
The fast convergence speed of the NLMS algorithm makes a favorite choice in the echo
cancellation system. The algorithm is tested at three different environments by changing
room dimension, microphone position, source position.
6.2.1 At environment 1: room [3,4,2.5], microphone [1,2,1], source
[1,1,1]
The desired signal received by microphone at environment 1, is shown in Figure 30, The
error signal estimated by the adaptive filtering with NLMS algorithm of order 2500, is shown
in the Figure 31 and the amount of echo cancellation after the adaptive filtering with NLMS
is plotted with respect to the order of the filter, is shown in the Figure 32.
36
Figure.31: Estimation error signal ‘e’ of NLMS at environment1
37
6.2.2 At environment 2: room [2,3,2.5], microphone [1,1,1], source
[1,2,1]
The desired signal received by microphone at environment 2, is shown in Figure 33, The
error signal estimated by the adaptive filtering with NLMS algorithm of order 2500, is shown
in the Figure 34 and the amount of echo cancellation after the adaptive filtering with NLMS
is plotted with respect to the order of the filter, is shown in the Figure 35.
38
Figure.35: ERLE of NLMS at environment2
39
Figure.37: Estimation error signal ‘e’ of NLMS at environment3
40
6.3 Echo cancellation using the RLS algorithm
The results of the RLS indicate the estimation error is very small even smaller than the
NLMS and APA. The outputs of RLS indicates better performance than NLMS and APA still
it was not preferred, as each iteration requires multiplications. As in echo cancellation
systems the FIR filter is usually in tousands. This gives very large number of multiplications
and implementation becomes too costly.
41
Figure.40: Estimation error signal ‘e’ of RLS at environment1
42
6.3.2 At environment 2: room [2,3,2.5], microphone [1,1,1], source
[1,2,1]
The desired signal received by microphone at environment 2, is shown in Figure 42, The
error signal estimated by the adaptive filtering with RLS algorithm of order 2500, is shown in
the Figure 43 and the amount of echo cancellation after the adaptive filtering with RLS is
plotted with respect to the order of the filter, is shown in the Figure 44.
43
Figure.44: ERLE of RLS at environment2
6.3.3 At environment 3: room [4,2,2], microphone [2,1,2], source
[2,2,2]
The desired signal received by microphone at environment 3, is shown in Figure 45, The
error signal estimated by the adaptive filtering with RLS algorithm of order 2500, is shown in
the Figure 46 and the amount of echo cancellation after the adaptive filtering with RLS is
plotted with respect to the order of the filter, is shown in the Figure 47.
44
Figure.46: Estimation error signal ‘e’ of RLS at environment3
45
6.4 Comparing ERLE of APA, NLMS and RLS in three
environments
The reduction of echo cancelled is measured as ERLE. The ERLE is measured as the ratio of
power of input desired signal and estimated error signal immediately after echo cancellation.
The measurement is in dB and also helps to calculate echo loss done by the adaptive
algorithm. A large value of ERLE indicates better echo cancellation.
ERLE is calculated for APA, NLMS and RLS at three different environments by varying
room dimensions, microphone position and source position and the results of the three
algorithms are represented in three ways as shown below.
The results of the algorithms at three different environments shows that RLS
has greater ERLE than the APA and NLMS, which indicates that the echo cancellation using
RLS is better than the APA and NLMS, But due to the more computational complexity it
takes long time to process than the APA and NLMS algorithms. The ERLE of APA, RLS
and NLMS are plotted and compared in graph in Figure 48, 50 and 52. Compared in chart as
shown in Figure 49, 51 and 53. The ERLE values with respect to the order of the filter are
also shown in table 2, 3 and 4. As ERLE of APA and RLS does not differ much, APA is the
preferable adaptive algorithm to use in this environment because of the computational
complexity and costly implementation of RLS.
46
Figure.48: ERLE comparison of NLMS, APA and RLS at environment 1 in graph
Table 2:
ORDER NLMS APA RLS
500 4.25 6.93 9.87
1000 10.35 14.79 18.57
1500 20.44 23.44 27.89
2000 27.67 32.89 36.98
2500 28.45 47.17 56.15
47
6.4.2 At environment 2: room [2,3,2.5], microphone [1,1,1], source
[1,2,1]
Table 3 :
ORDER NLMS APA RLS
500 8.28 10.43 13.21
1000 18.78 21.52 25.42
1500 25.33 30.53 35.03
2000 33.02 41.41 46.68
2500 31.53 48.25 56.76
48
6.4.3 At environment 3 : room [4,2,2], microphone [2,1,2], source
[2,2,2]
Table 4:
ORDER NLMS APA RLS
500 11.36 13.21 16.74
1000 21.71 24.43 27.97
1500 29.36 33.76 38.64
2000 33.11 40.50 45.86
2500 29.20 47.99 58.53
49
7. Conclusion and future work
7.1 Summary
The advancements of technology in a wide range of acoustic echo cancellation applications
hands-free communication, mobile phones, bluetooth headset, skype calls and
teleconferencing systems. In mobile technology and wireless systems the speech
enhancement and echo cancellation are playing an important role. There are many echo
cancellation methods are used to suppress the echo as mentioned in section 1, one of the most
advanced method is adaptive filtering technique. There are many adaptive algorithms are
available to suppress the echo in the hands-free communication, to choose the best algorithm
among them ERLE is the parameter used to calculate and compare the performance of the
adaptive algorithms in AEC.
7.2 Conclusion
This thesis is a collaborative work done by a group of two; the focus is on echo cancellation
enhancement in hands free speech communication using adaptive filtering technique. There
are many adaptive algorithms available in the literature for echo cancellation and every
algorithm have its own properties, but the aim of the algorithms is to achieve higher ERLE at
a higher rate of convergence with less complexity.
The adaptive algorithms NLMS, APA and RLS for echo cancellation were
successfully implemented in MATLAB. The three algorithms were tested in three different
echo occurring environments by changing microphone position, source position and room
dimensions. The performance evaluation of the NLMS, APA and RLS algorithms are
measured with ERLE parameter. The results show that the RLS algorithm has best echo
cancellation with highest rate of convergence speed among the three algorithms. The highest
ERLE at three different environments are 56.15dB, 56.76dB and 58.53dB but the
computational complexity is more than the other algorithms. The amount of echo
cancellation with APA algorithm is near to RLS performance with less computational
complexity and easier to implement in real time. The highest ERLE at three different
environments are 47.17dB, 48.25dB and 47.99dB. The NLMS algorithm has less
50
computational complexity and very easy to implement in real time but it gave worst
performance for echo cancellation among the three. The highest ERLE at three different
environments are 28.45dB, 33.02dB and 33.11dB. The detailed view of the comparison
results of three algorithms at three different environments are shown in tables, plots and
graphs in the 6.3 section. The RLS algorithm gives best results among the three algorithms;
still it is not used because it requires multiplications per iteration, as for echo
cancellation systems the order is usually in the thousands in real time. Thus the number of
multiplications required is very large making the RLS algorithm too costly to implement.
51
8. Bibliography
[2] N. Grbic, “Optimal and Adaptive Subband Beamforming - Principles and Applications,”
Ph. D. dissertation, Dept. of Telecommunications and Signal Processing, Blekinge Institute
of Technology, Ronneby, SW, 2001.
[4] Lollmann, H. W.; Peter Vary; “Low Delay Noise Reduction and Dereverberation in
Hearing Aids,” EURASIP journal on Advances in Signal Process., Mar. 2009, Available:
http://delivery.acm.org/10.1145/1600000/1592486/p1lollmann.pdf?ip=194.47.147.33&acc=P
UBLI& CFID=76616061&CFTOKEN=13910873& acm
=1333990169_2f68c8c6972969074a9db89563e27bdc
[5] V. Valimaki and T. I. Laakso, “Principles of Fractional Delay Filters,” IEEE Int. Conf. on
Acoustic, Speech and Signal Proc., Istanbul, Turkey, 2000.
[7] J.B. Allen and D.A. Berkley, “Image Method for Efficiently Simulating Small Room
Acoustics,” Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943–950, 1979.
52
[9] Aditya Sri Teja .P, “Simulation of Microphone Inaccuracies and Multi-channel Speech
Enhancement using Beamformers in Reverberant Environment ,” M. S. Thesis, Dept. of
Signal Process., Blekinge Institute of Technology (BTH), Blekinge, Sweden, 2012.
[11] S. Haykins , “Adaptive Filter Theory” , Prentice Hall ,New Jersey, 1996.
[12] S. Qureshi, “Adaptive Equalization,” Proceedings of the IEEE, vol.73, No.9, pp.
1349-1387, Sept. 1985.
[13] Shoji Makino, Member, IEEE, Yutaka Kaneda, Member, IEEE and Nobuo Koizumi,
“Exponentially weighted step size NLMS adaptive filter based on the statistics of a room
impulse response”, IEEE Trans. on speech and audio Processing, vol. 1, No.1, pp.101-108,
Jan 1993.
[14] M.M. Sondhi, “An Adaptive Echo Canceller,” Bell Syst. Tech. J., vol. 46, No.3, pp.
497-511, Mar. 1967
[15] Hosien Asjadi, Mohammad Ababafha, “Adaptive Echo Cancellation Based On Third
Order Cumulant, International Conference on Information, Communications and Signal
Processing, ICICS '97 Singapore, September 1997
53
[17] Da-Zheng Feng, Xian-Da Zhang, Dong-Xia Chang, and Wei Xing Zheng, “A Fast
Recursive Total Least Squares Algorithm for Adaptive FIR Filtering”, IEEE Trans. On
Signal Processing vol.52, No.10, pp.2729-2737, Oct 2004
[18] Gupta .S, “Acoustic Echo Cancellation using Conventional Adaptive Algorithms and
modified Variable Step Size LMS Algorithm,” M. S. Thesis, Dept. of Electron. & Commun.
Eng., Thapar Inst. Of Eng. And Technology, Punjab, India, 2007
[21] Fukane, A.R.; Sahare, S. L.; “Enhancement of Noisy Speech Signals for Hearing
Aids,” 2 0 1 1 Int. Conf. on, Communication Systems and Network Technologies (CSNT),
pp.490-494, Pune, IN, June3-5.
[22] Elko, G. W, Anh-Tho Pguyen Pong; “A Simple Adaptive First Order Differential
Microphone,”IEEE Applications of Signal Process. to Audio and Acoustics, 1995, pp. 169 -
172.
54
[25] Amit Munjal, Vibha Aggarwal, Gurpal Singh.; “RLS algorithm for Acoustic Echo
Cancellation”. 2 0 0 8 national. Conf. on, Challenges & Opportunities in Information
Technology, pp.301, Mandi Gobindgarh, IN, March29.
[26] K. Ozeki and T. Umeda. “An adaptive filtering algorithm using an orthogonal
projection to an affine subspace and its properties”. Electronics and communications in
japan, 67-A(5):126-132, February 1984.
[27] Jin Woo Yoo and Poo Gyeon Park, “An Affine Projection Algorithm with Variable
Projection Order Using the MSE Criterion”. IMECS, Hong kong, March 14-16, 2012.
[28] Paulo S. R. Diniz, “Adaptive Filtering Algorithms and Practical Implementation”,
Springer, July 2008.
55