Speech Coder

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

mohitgoel4u.net(Mr.

Feb)

Speech Processing
mohitgoel4u.net(Mr. Feb)

Speech Coding Techniques


Waveform approximating coders: Speech coders producing
a reconstructed signal which converges towards the original
signal with decreasing quantization error.

Parametric coders: Speech coders producing a


reconstructed signal which does not converge to the
original signal with decreasing quantization error.
mohitgoel4u.net(Mr. Feb)

1. Parametric Coders
Parametric coders model the speech signal using a set
of model parameters like spectral envelope, pitch and
energy contour, etc.
The extracted parameters at the encoder are quantized
and transmitted to the decoder. The decoder
synthesizes speech according to the specified model.

The speech production model does not account for the


quantization noise or try to preserve the waveform
similarity between the synthesized and the original
speech signals.
mohitgoel4u.net(Mr. Feb)

1. Parametric Coders

The model parameter estimation may be an open loop


process with no feedback from the quantization or the
speech synthesis.

Furthermore, they do not preserve the waveform


similarity and the measurement of signal to noise ratio
(SNR) is meaningless, as often the SNR becomes
negative when expressed in dB (as the input and
output waveforms may not have phase alignment).
mohitgoel4u.net(Mr. Feb)

1.1 Linear Prediction Based Vocoders

Linear Prediction (LP) based vocoders are designed


to emulate the human speech production
mechanism

The vocal tract is modelled by a linear prediction filter.


The glottal pulses and turbulent air flow at the glottis
are modelled by periodic pulses and Gaussian noise
respectively, which form the excitation signal of the
linear prediction filter.
mohitgoel4u.net(Mr. Feb)

1.1 Linear Prediction Based Vocoders


The LP filter coefficients, signal power, binary voicing
decision (i.e. periodic pulses or noise excitation), and
pitch period of the voiced segments are estimated for
transmission to the decoder.

The main weakness of LP based vocoders is the binary


voicing decision of the excitation, which fails to model
mixed signal types with both periodic and noisy
components.
mohitgoel4u.net(Mr. Feb)

1.2 Harmonic Coders


Harmonic or sinusoidal coding represents the speech
signal as a sum of sinusoidal components. The model
parameters, i.e. the amplitudes, frequencies and
phases of sinusoids, are estimated at regular intervals
from the speech spectrum.

The frequency tracks are extracted from the peaks


of the speech spectra, and the amplitudes and
frequencies are interpolated in the synthesis
process for smooth evolution
mohitgoel4u.net(Mr. Feb)

1.2 Harmonic Coders

The general sinusoidal model does not restrict the


frequency tracks to be harmonics of the fundamental
frequency.

Increasing the parameter extraction rate converges the


synthesized speech waveform towards the original, if
the parameters are unquantized.

However at low bit rates the phases are not


transmitted and estimated at the decoder.
mohitgoel4u.net(Mr. Feb)

2. Waveform-approximating Coders

Waveform coders minimize the error between the


synthesized and the original speech waveforms.

Examples of this type of coder are Pulse Code


Modulation (PCM) and Adaptive Differential Pulse Code
Modulation (ADPCM)

PCM transmit a quantized value for each speech


sample.
mohitgoel4u.net(Mr. Feb)

2. Waveform-approximating Coders

ADPCM employs an adaptive pole zero predictor and


quantizes the error signal, with an adaptive quantizer
step size. ADPCM predictor coefficients and the
quantizer step size are backward adaptive and updated
at the sampling rate.

The recent waveform-approximating coders based on


time domain analysis by synthesis such as Code Excited
Linear Prediction (CELP), explicitly make use of the
vocal tract model and the long term prediction.
mohitgoel4u.net(Mr. Feb)

2. Waveform-approximating Coders

CELP coders buffer the speech signal and perform


block based analysis and transmit the prediction
filter coefficients along with an index for the
excitation vector.

They also employ perceptual weighting so that the


quantization noise spectrum is masked by the
signal level.
mohitgoel4u.net(Mr. Feb)
mohitgoel4u.net(Mr. Feb)

3. Hybrid Coding of Speech


When the bit rate is reduced, the perceived quality of
Adaptive Differential Pulse Code Modulation (ADPCM)
,Code Excited Linear Prediction (CELP) coders tends to
degrade more for some speech segments while
remaining adequate for others. This shows that the
assumed coding principle is not adequate for all speech
types.
In order to circumvent this problem, hybrid coders that
combine different coding principles to encode different
types of speech segments have been Introduced. A
hybrid coder can switch between a set of predefined
coding modes. Hence they are also referred to as
multimode coders. A hybrid coder is an adaptive coder,
which can change the coding technique or mode
according to the source, selecting the best mode.
mohitgoel4u.net(Mr. Feb)

Requirements of speech coders


1. Quality and Capacity

Speech quality and bit rate are two factors that directly
conflict with each other.

Lowering the bit rate of the speech coder, i.e. using


higher signal compression, causes degradation of
quality to a certain extent and if bit rate is increased
the quality is improved but more data has to be
saved.
mohitgoel4u.net(Mr. Feb)

For systems that connect to the Public Switched


Telephone Network (PSTN) and associated systems, the
quality requirements are strict and must conform to
constraints and guidelines imposed by the relevant
regulatory bodies, e.g. ITU (previously CCITT). Such
systems demand high quality (toll quality) coding.

Private commercial networks and military systems may


compromise the quality to lower the capacity
requirements.
mohitgoel4u.net(Mr. Feb)

2. Coding Delay

Coding delay may be algorithmic (the buffering of speech


for analysis), computational (the time taken to process
the stored speech samples) or due to transmission.

Only the first two concern the speech coding


subsystem, although very often the coding scheme is
tailored such that transmission can be initiated even
before the algorithm has completed processing all of
the information in the analysis frame
mohitgoel4u.net(Mr. Feb)

For PSTN applications, low delay is essential if the major


problem of echo is to be minimized. So, extra echo
cancellers will be required if coders with long delays are
introduced.

3. Robustness

For many applications, the speech source coding rate


typically occupies only a fraction of the total channel
capacity, the rest being used for forward error
correction (FEC) and signalling.
mohitgoel4u.net(Mr. Feb)

For mobile connections, which suffer greatly from


both random and burst errors, a coding schemes built-
in tolerance to channel errors is vital for an acceptable
average overall performance, i.e. communication
quality.

For other applications employing less severe channels,


e.g. fibre-optic links, the problems due to channel errors
are reduced significantly and robustness can be ignored
for higher clean channel speech quality. This is a major
difference between the wireless mobile systems and
those of the fixed link systems.
mohitgoel4u.net(Mr. Feb)

In addition to the channel noise, coders may need to


operate in noisy background environments. As
background noise can degrade the performance of
speech parameter extraction, it is crucial that the coder
is designed in such a way that it can maintain good
performance at all times.

4. Complexity and Cost

As more sophisticated algorithms are devised, the


computational complexity is increased.
mohitgoel4u.net(Mr. Feb)

One technique for overcoming power consumption


whilst also improving channel efficiency is digital speech
interpolation (DSI) . DSI exploits the fact that only
around half of speech conversation is actually active
speech thus, during inactive periods, the channel can
be used for other purposes, including limiting the
transmitter activity, hence saving power.

An important subsystem of DSI is the voice activity


detector (VAD) which must operate efficiently and
reliably to ensure that real speech is not mistaken for
silence and vice versa.

You might also like