Speech Processing

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Course Name: Speech Processing

Course Code:
Course Credit: 3-0-4

Course Objectives:
● Provide students with the knowledge of basic characteristics of speech signal in relation
to production and hearing of speech by humans
● Convey details of a range of commonly used speech feature extraction techniques.
● Enhancing the basic understanding of multidimensional techniques for speech
representation and classification methods.
● Give an overview of practical aspects of speech processing, including robustness.
● Applications of speech processing, including speech enhancement, speaker recognition
and speech recognition.
● Familiarize with the practical aspects of speech algorithms implementation.

Course Content:
Theory
Module 1: 9 hours
Basics Concepts:
Speech Fundamentals: Articulatory Phonetics – Production and Classification of Speech Sounds;
Acoustic Phonetics – acoustics of speech production; Review of Digital Signal Processing
concepts; Short-Time Fourier Transform, Filter-Bank and LPC Methods.

Module 2: 9 hours
Speech Analysis:
Features, Feature Extraction and Pattern Comparison Techniques: Speech distortion measures –
mathematical and perceptual – Log Spectral Distance, Cepstral Distances, Weighted Cepstral
Distances and Filtering, Likelihood Distortions, Spectral Distortion using a Warped Frequency
Scale, LPC, PLP and MFCC Coefficients, Time Alignment and Normalization – Dynamic Time
Warping, Multiple Time – Alignment Paths.
Module 3: 9 hours
Speech Modelling:
Hidden Markov Models: Markov Processes, HMMs – Evaluation, Optimal State Sequence –
Viterbi Search, Baum-Welch Parameter Re-estimation, Implementation issues.

Module 4: 9 hours
Speech Recognition:
Large Vocabulary Continuous Speech Recognition: Architecture of a large vocabulary
continuous speech recognition system – acoustics and language models – n-grams, context
dependent sub-word units; Applications and present status.

Module 5: 9 hours
Speech Synthesis:
Text-to-Speech Synthesis: Concatenative and waveform synthesis methods, sub-word units for
TTS, intelligibility and naturalness – role of prosody, Applications and present status.

Course Outcomes:
● Discuss the speech production and perception process.
● Express the speech signal in terms of its time domain and frequency domain
representations
● Express the different ways in which speech signal can be modeled.
● Analyze speech for automatic recognition and extraction of information.
● Design and implement algorithms for processing speech signals.
● Build a simple speech recognition system.

Text Books:
1. Lawrence Rabinerand Biing-Hwang Juang, “Fundamentals of Speech Recognition”,
Pearson Education, 2003.
2. Daniel Jurafsky and James H Martin, “Speech and Language Processing – An
Introduction to Natural Language Processing, Computational Linguistics, and Speech
Recognition”, Pearson Education.
3. K. Sayood, Introduction to Data Compression, 2nd Ed, Morgan Kaufmann, 2000.

Reference Books:
1. D. O'Shaughnessy, Speech Communications: Human and Machine, 2nd Ed, IEEE Press, 2000.
2. A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic,
1991.
Practical’s
List of Experiments:

1. Lab Experiment: Introduction to Speech Signal Processing


a. Understanding speech fundamentals and articulatory phonetics
b. Analyzing the acoustics of speech production using digital signal processing
techniques
c. Implementing Short-Time Fourier Transform (STFT) and Filter-Bank methods
2. Lab Experiment: Speech Feature Extraction
a. Extracting speech features using LPC (Linear Predictive Coding)
b. Implementing Cepstral Distances and Log Spectral Distance for speech distortion
measures
c. Calculating MFCC (Mel-Frequency Cepstral Coefficients) for feature
representation
3. Lab Experiment: Dynamic Time Warping
a. Implementing Dynamic Time Warping (DTW) for time alignment and
normalization
b. Comparing multiple time alignment paths using DTW
4. Lab Experiment: Hidden Markov Models (HMMs)
a. Understanding Markov Processes and their applications
b. Implementing HMMs for speech modeling
c. Implementing the Viterbi algorithm for optimal state sequence estimation
5. Lab Experiment: Baum-Welch Parameter Re-estimation
a. Implementing the Baum-Welch algorithm for parameter re-estimation in HMMs
b. Training HMMs on speech data and updating model parameters
6. Lab Experiment: Large Vocabulary Continuous Speech Recognition (LVCSR)
a. Building the architecture of an LVCSR system
b. Integrating acoustics and language models
c. Implementing n-grams for language modeling
7. Lab Experiment: Context-Dependent Sub-word Units
a. Exploring context-dependent sub-word units in LVCSR systems
b. Training and evaluating models with sub-word units
8. Lab Experiment: Evaluating Speech Recognition Performance
a. Using standard evaluation metrics like Word Error Rate (WER) and Phoneme
Error Rate (PER)
b. Analyzing the performance of the speech recognition system
9. Lab Experiment: Speech Synthesis Techniques
a. Implementing Concatenative Speech Synthesis
b. Exploring waveform synthesis methods for Text-to-Speech (TTS)
10. Lab Experiment: Sub-word Units for Text-to-Speech (TTS)
a. Implementing sub-word units for TTS synthesis
b. Comparing the performance of different sub-word unit approaches
11. Lab Experiment: Role of Prosody in TTS
a. Understanding the role of prosody in speech synthesis
b. Manipulating prosodic features for naturalness in TTS
12. Lab Experiment: Evaluating TTS Synthesis
a. Using subjective evaluation methods for TTS intelligibility and naturalness
b. Comparing different TTS synthesis techniques
13. Lab Experiment: Speech Synthesis Applications
a. Implementing TTS for specific applications like voice assistants or audiobook
narration
b. Assessing the suitability of TTS for different scenarios
14. Lab Experiment: Present Status of Speech Technology
a. Surveying recent advancements in speech recognition and synthesis
b. Exploring state-of-the-art systems and applications
15. Lab Experiment: Comprehensive Speech Processing Project
a. Students work on a hands-on project that integrates concepts from all modules to
build a speech processing system. This project could involve aspects of speech
recognition, synthesis, and analysis using HMMs, DTW, and feature extraction
techniques.

You might also like