Speech Processing
Speech Processing
Speech Processing
Course Code:
Course Credit: 3-0-4
Course Objectives:
● Provide students with the knowledge of basic characteristics of speech signal in relation
to production and hearing of speech by humans
● Convey details of a range of commonly used speech feature extraction techniques.
● Enhancing the basic understanding of multidimensional techniques for speech
representation and classification methods.
● Give an overview of practical aspects of speech processing, including robustness.
● Applications of speech processing, including speech enhancement, speaker recognition
and speech recognition.
● Familiarize with the practical aspects of speech algorithms implementation.
Course Content:
Theory
Module 1: 9 hours
Basics Concepts:
Speech Fundamentals: Articulatory Phonetics – Production and Classification of Speech Sounds;
Acoustic Phonetics – acoustics of speech production; Review of Digital Signal Processing
concepts; Short-Time Fourier Transform, Filter-Bank and LPC Methods.
Module 2: 9 hours
Speech Analysis:
Features, Feature Extraction and Pattern Comparison Techniques: Speech distortion measures –
mathematical and perceptual – Log Spectral Distance, Cepstral Distances, Weighted Cepstral
Distances and Filtering, Likelihood Distortions, Spectral Distortion using a Warped Frequency
Scale, LPC, PLP and MFCC Coefficients, Time Alignment and Normalization – Dynamic Time
Warping, Multiple Time – Alignment Paths.
Module 3: 9 hours
Speech Modelling:
Hidden Markov Models: Markov Processes, HMMs – Evaluation, Optimal State Sequence –
Viterbi Search, Baum-Welch Parameter Re-estimation, Implementation issues.
Module 4: 9 hours
Speech Recognition:
Large Vocabulary Continuous Speech Recognition: Architecture of a large vocabulary
continuous speech recognition system – acoustics and language models – n-grams, context
dependent sub-word units; Applications and present status.
Module 5: 9 hours
Speech Synthesis:
Text-to-Speech Synthesis: Concatenative and waveform synthesis methods, sub-word units for
TTS, intelligibility and naturalness – role of prosody, Applications and present status.
Course Outcomes:
● Discuss the speech production and perception process.
● Express the speech signal in terms of its time domain and frequency domain
representations
● Express the different ways in which speech signal can be modeled.
● Analyze speech for automatic recognition and extraction of information.
● Design and implement algorithms for processing speech signals.
● Build a simple speech recognition system.
Text Books:
1. Lawrence Rabinerand Biing-Hwang Juang, “Fundamentals of Speech Recognition”,
Pearson Education, 2003.
2. Daniel Jurafsky and James H Martin, “Speech and Language Processing – An
Introduction to Natural Language Processing, Computational Linguistics, and Speech
Recognition”, Pearson Education.
3. K. Sayood, Introduction to Data Compression, 2nd Ed, Morgan Kaufmann, 2000.
Reference Books:
1. D. O'Shaughnessy, Speech Communications: Human and Machine, 2nd Ed, IEEE Press, 2000.
2. A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic,
1991.
Practical’s
List of Experiments: