03 MFCC
03 MFCC
03 MFCC
Introduction
Topic: Spectrogram, Cepstrum
and Mel-Frequency Analysis
Kishore Prahallad
Email: [email protected]
2
Speech Technology - Kishore Prahallad ([email protected])
Spectrogram
3
Speech Technology - Kishore Prahallad ([email protected])
Speech signal represented as a sequence of spectral vectors
Spectrum
4
Speech Technology - Kishore Prahallad ([email protected])
Speech signal represented as a sequence of spectral vectors
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Spectrum
5
Speech Technology - Kishore Prahallad ([email protected])
Speech signal represented as a sequence of spectral vectors
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Spectrum
Amp.
Hz 6
Speech Technology - Kishore Prahallad ([email protected])
Speech signal represented as a sequence of spectral vectors
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Spectrum
Rotate it by 90 degrees
Hz
Amplitude 7
Speech Technology - Kishore Prahallad ([email protected])
Speech signal represented as a sequence of spectral vectors
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Spectrum
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Spectrum
Hz
Time 9
Speech Technology - Kishore Prahallad ([email protected])
Speech signal represented as a sequence of spectral vectors
Time Vs Frequency
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
representation of a speech
signal is referred to as
spectrogram
Spectrum
Hz
Time 10
Speech Technology - Kishore Prahallad ([email protected])
Some Real Spectrograms
11
Speech Technology - Kishore Prahallad ([email protected])
Why we are bothered about
spectrograms
Phones and their
properties are
better observed
in spectrogram
12
Speech Technology - Kishore Prahallad ([email protected])
Why we are bothered about
spectrograms
Sounds can be
identified much
better by the
Formants and by
their transitions
13
Speech Technology - Kishore Prahallad ([email protected])
Why we are bothered about
spectrograms
Sounds can be
identified much
better by the
Formants and by
their transitions
Hidden Markov
Models implicitly
model these
spectrograms to
perform speech
recognition
14
Speech Technology - Kishore Prahallad ([email protected])
Usefulness of Spectrogram
• Time-Frequency representation of the speech signal
15
Speech Technology - Kishore Prahallad ([email protected])
Cepstral Analysis
16
Speech Technology - Kishore Prahallad ([email protected])
A Sample Speech Spectrum
dB
Frequency (Hz)
dB
Frequency (Hz)
18
Speech Technology - Kishore Prahallad ([email protected])
Spectral Envelope
Spectrum
Spectral
Envelope
Spectral
details
19
Speech Technology - Kishore Prahallad ([email protected])
Spectral Envelope
Spectrum
log X[k]
Spectral
log H[k]
Envelope
20
Speech Technology - Kishore Prahallad ([email protected])
Spectral Envelope
Spectrum
log X[k]
log X[k] = log H[k] + log E[k]
22
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Spectrum
• Trick: Take FFT of
the spectrum!!
• An FFT on spectrum
referred to as Inverse
FFT (IFFT). Spectral
Envelope
• Note: We are dealing
with spectrum in log
domain (part of the
trick)
• IFFT of log spectrum
would represent the
signal in pseudo-
Spectral
frequency axis details
23
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Spectrum
Spectral
Envelope
Spectral
A pseudo-frequency
details
axis 24
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Spectrum
Spectral
Envelope
Spectral
A pseudo-frequency
details
axis 25
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Spectrum
Spectral
Envelope
IFFT
Spectral
A pseudo-frequency
details
axis 26
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Spectrum
IFFT
Spectral
A pseudo-frequency
details
axis 27
Speech Technology - Kishore Prahallad ([email protected])
Gives a peak
Play a Mathematical Trick
at 4 Hz in
frequency
axis Spectrum
IFFT
Spectral
A pseudo-frequency
details
axis 28
Speech Technology - Kishore Prahallad ([email protected])
Gives a peak
Play a Mathematical Trick
at 4 Hz in
frequency
axis Spectrum
IFFT
Spectral
A pseudo-frequency
details
axis 29
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Spectrum
Spectral
Envelope
IFFT
Spectral
A pseudo-frequency
details
axis 30
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Gives a peak
at 100 Hz in Spectrum
frequency
Low Freq.axis High Freq.
region region
Treat this as a
sine wave with Spectral
100 cycles per Envelope
sec.
IFFT
Spectral
A pseudo-frequency
details
axis 31
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Spectrum
Spectral
Envelope
IFFT
IFFT
Spectral
A pseudo-frequency
details
axis 32
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
Spectrum
Spectral
Envelope
Spectral
A pseudo-frequency
details
axis 33
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
log X[k] = log H[k] + log E[k] Spectrum
IFFT
log E[k]
Spectral
A pseudo-frequency
details
axis 34
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
x[k] = h[k] + e[k] log X[k] = log H[k] + log E[k] Spectrum
IFFT
log E[k]
Spectral
A pseudo-frequency
details
axis 35
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
x[k] = h[k] + e[k] log X[k] = log H[k] + log E[k] Spectrum
IFFT
Spectral
A pseudo-frequency
details
axis 36
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
x[k] = h[k] + e[k] log X[k] = log H[k] + log E[k] Spectrum
IFFT
Spectral
A pseudo-frequency
details
axis 37
Speech Technology - Kishore Prahallad ([email protected])
Play a Mathematical Trick
x[k] = h[k] + e[k] log X[k] = log H[k] + log E[k] Spectrum
IFFT
39
Speech Technology - Kishore Prahallad ([email protected])
Mel-Frequency Analysis
40
Speech Technology - Kishore Prahallad ([email protected])
Review: What we did
• We captured spectral envelope (curve
connecting all formants)
• BUT: Perceptual experiments say human ear
concentrates on certain regions rather than
using whole of the spectral envelope….
dB
Frequency (Hz)
41
Speech Technology - Kishore Prahallad ([email protected])
Mel-Frequency Analysis
• Mel-Frequency analysis of speech is
based on human perception experiments
• It is observed that human ear acts as filter
– It concentrates on only certain frequency
components
• These filters are non-uniformly spaced on
the frequency axis
– More filters in the low frequency regions
– Less no. of filters in high frequency regions
42
Speech Technology - Kishore Prahallad ([email protected])
Mel-Frequency Filters
43
Speech Technology - Kishore Prahallad ([email protected])
Mel-Frequency Filters
More no. of filters in low Lesser no. of filters in
freq. region high freq. region
44
Speech Technology - Kishore Prahallad ([email protected])
Mel-Frequency Cepstral
Coefficients (MFCC)
• Spectrum Mel-Filters Mel-Spectrum
• Say log X[k] = log (Mel-Spectrum)
• NOW perform Cepstral analysis on log X[k]
– log X[k] = log H[k] + log E[k]
– Taking IFFT
– x[k] = h[k] + e[k]
• Cepstral coefficients h[k] obtained for Mel-
spectrum are referred to as Mel-Frequency
Cepstral Coefficients often denoted by *MFCC*
45
Speech Technology - Kishore Prahallad ([email protected])
Speech signal represented as a sequence of spectral vectors
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Spectrum
Mel-Filters
Cepstral Analy.
46
Speech Technology - Kishore Prahallad ([email protected])
Speech signal represented as a sequence of CEPSTRAL vectors
FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT
Spectrum
Cepstral
Vectors
47
Speech Technology - Kishore Prahallad ([email protected])
Why we are going to use MFCC
• Speech synthesis
– Used for joining two speech segments S1 and S2
– Represent S1 as a sequence of MFCC
– Represent S2 as a sequence of MFCC
– Join at the point where MFCCs of S1 and S2 have
minimal Euclidean distance
• Used in speech recognition
– MFCC are mostly used features in state-of-art speech
recognition system
48
Speech Technology - Kishore Prahallad ([email protected])
Summary: Process of Feature
Extraction
• Speech is analyzed over short analysis window
• For each short analysis window a spectrum is obtained
using FFT
• Spectrum is passed through Mel-Filters to obtain Mel-
Spectrum
• Cepstral analysis is performed on Mel-Spectrum to
obtain Mel-Frequency Cepstral Coefficients
• Thus speech is represented as a sequence of Cepstral
vectors
• It is these Cepstral vectors which are given to pattern
classifiers for speech recognition purpose
49
Speech Technology - Kishore Prahallad ([email protected])
Additional Reading
• Chapter 6
– Pg: 273 – 281
– Pg: 304 – 311
– Pg: 314 - 316
50
Speech Technology - Kishore Prahallad ([email protected])