Proj 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

GEORGIA INSTITUTE OF TECHNOLOGY School of Electrical Engineering EE6255 Project No.

2 Discrete-Time Models for the Speech Signal


Date Assigned: Date Due: January 24, 2003 February 14, 2003

Introduction

One of the most fruitful areas of application of digital signal processing is in the processing of speech signals. The basis for most digital speech processing algorithms is a discrete-time system model for the the production of the speech waveform. There are many useful models that have been used as the basis for speech synthesis, speech coding and speech recognition algorithms. One such model is depicted in Figure 1. The purpose of this project is to show how such a model can be related to a specic speech waveform.

Pitch Period
c

Impulse Train Generator

Glottal Pulse Model G(z )

EV g g g g g g d d s E Tract Model

Vocal Tract Parameters

Voiced/ Unvoiced Switch


ggs

Vocal V (z )

Radiation
E Model E

suG [n]

uL [n]

R(z )

pL [n]

Random E Noise Generator AN

Figure 1: Discrete-time system model for speech production.

Background Reading
The following references provide appropriate background for this project. (a) G. Fant, Acoustic Theory of Speech Production, Mouton, The Hague, 1970. (b) T. F. Quatieri, Discrete-Time Speech Signal Processing, Prentice-Hall, Inc., 2002. (c) L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Inc., Englewood Clis, NJ, 1978. (d) A. E. Rosenberg, Eect of glottal pulse shape on the quality of natural vowels, Journal of Acoustical Society of America, Vol. 49, No. 2, pp. 583-590, February, 1971.

Getting Started

You will need to download the le project2_stuff.zip from WebCT. This le contains the M-les mentioned below.

Glottal Pulse Models

Project Description
The model of Figure 1 often underlies our thinking about the speech waveform, and in some cases, such a system is explicitly used as an speech synthesizer. In this part, we will study the part labeled Glottal Pulse Model G(z ) in Figure 1.

Hints
In speech production, the excitation for voiced speech is a result of the quasi-periodic opening and closing of the opening between the vocal cords (the glottis). This is modeled in Figure 1 by the combination of the impulse train generator and the glottal pulse model lter. The shape of the pulse aects the magnitude and phase of the spectrum of the synthetic speech output of the model. Exercise 3.1: The Exponential Model

A simple model that we will call the exponential model is represented by G(z ) = az 1 (1 az 1 )2 (0.1)

Write an M-le to generate Npts samples of the corresponding glottal pulse waveform g [n] and also compute the frequency response of the glottal pulse model. The calling sequence for this function should be [gE,GE,W]=glottalE(a,Npts,Nfreq) where gE is the exponential glottal waveform vector of length Npts, GE is the frequency response of the exponential glottal model at the Nfreq frequencies W between 0 and radians. You will use this function later. 2

Exercise 3.2:

The Rosenberg Model

Rosenberg [4] used inverse ltering to extract the glottal waveform from speech. Based on his experimental results, he devised a model for use in speech synthesis, which is given by the equation gR [n] =

cos(n/N1 )] cos[ (n N1 )/(2N2 )] 0

1 2 [1

0 n N1 N1 n N1 + N2 otherwise

(0.2)

This model incorporates most of the important features of the time waveform of glottal waves estimated by inverse ltering and by high-speed motion pictures. Write an M-le to generate all N1 + N2 +1 samples of a Rosenberg glottal pulse with parameters N1 and N2 , and compute the frequency response of the Rosenberg glottal pulse model. The calling sequence for this function should be [gR,GR,W]=glottalR(N1,N2,Nfreq) where gR is the Rosenberg glottal waveform vector of length N1+N2+1, GR is the frequency response of the glottal model at the Nfreq frequencies W between 0 and radians. Exercise 3.3: Comparison of Glottal Pulse Models

In this exercise you will compare three glottal pulse models. (a) First, use the M-les from Exercises 2.1 and 2.2 to compute Npts=51 samples of the exponential glottal pulse g for a=0.91 and compute the Rosenberg pulse gR for the parameters N1=40 and N2=10. (b) Also compute a new pulse gRflip by time reversing gR using the MATLAB function fliplr( ) for row vectors or flipud( ) for column vectors. This has the eect of creating a new causal pulse of the form gRf lip [n] = gR [(n N1 N2 )] (0.3) Determine the relationship between GRf lip (ej ), the Fourier transform of gRf lip [n], and GR (ej ), the Fourier transform of gR [n]. (c) Now plot all three of these 51-point vectors on the same graph using plot( ). Normalize the exponential glottal pulse by dividing by its maximum value before plotting. Also plot the frequency response magnitude in dB for all three pulses on the same graph. Experiment with the parameters of the models to see how the time-domain wave shapes aect the frequency response. (d) The exponential model has a zero at z = 0 and a double pole at z = a. For the parameters N1=40 and N2=10, use the MATLAB function roots( ) to nd the zeros of the z -transform of the Rosenberg model and also the zeros of the ipped Rosenberg model. Plot them using the MATLAB function zplane( ) or the function zpl( ) that is supplied in project2_stuff.zip. Note that the Rosenberg model has all its zeros outside the unit circle (except one at z = 0). Such a system is called a maximum-phase system. The ipped Rosenberg model, however, should be found to have all its zeros inside the unit circle, and thus, it is a minimum-phase system. Show that in general, if a signal is maximum-phase, then ipping it as in Eq. (0.3) produces a minimum-phase signal and vice-versa. 3

Lossless Tube Vocal Tract Models

Project Description
One approach to modeling sound transmission in the vocal tract is through the use of concatenated lossless acoustic tubes as depicted in Figure 2.

A1
'

A2
E

A3

A4

A5

A6

A7

'

Glottis uG (t)

Lips uL (t)

Figure 2: Concatenation of (N=7) lossless acoustic tubes of equal length as a model of sound transmission in the vocal tract. Using the acoustic theory of speech production [1-3], it can be shown that the lossless assumption and the regular structure leads to wave simple equations and simple boundary conditions at the tube junctions so that a solution for the transmission properties of the model is relatively straightforward, and can be interpreted as in Figure 3(a) where = x/c is the one-way propagation delay of the sections. For sampled signals with sampling period T = 2 , the structure of Figure 3(a) (or equivalently Fig. 2) implies a corresponding discrete-time lattice lter as shown in Figure 3(b) or 3(c).[2,3]

Hints
Lossless tube models are useful for gaining insight into the acoustic theory of speech production, and they are also useful for implementing speech synthesis systems. We have shown that if rG = 1, the discrete-time vocal tract model consisting of a concatenation of N lossless tubes of equal length has system function
N

(1 + rk )z N/2 V (z ) =
k=1

D(z )

(0.1)

1 + rG (1 + r1 ) 2 - Delay uG (t) rG r1 ? 6 Delay    (1 r1 )

- Delay

(1 + r2 )

- Delay

rL

r1 6 

r2 ? Delay  (a)  (1 r2 )

r2 6 

(1 + rL ) uL (t) ?

Delay 

1 + rG 2 uG (nT ) rG 6

z 1/2 -

(1 + r1 ) r1 6

z 1/2 -

(1 + r2 ) r2 6

z 1/2 rL  z 1/2

r1 ?  z 1/2

r2 ?  z 1/2 (b)

(1 + rL ) uL (nT ) ?

 (1 r1 )

 (1 r2 )

1 + rG 2 uG (nT ) rG 6

z 1 -

(1 + r1 ) r1 6

z 1 -

(1 + r2 ) r2 6

z 1 rL 

r1 ? 

r2 ?  (c)

(1 + rL ) uL (nT ) ?

 (1 r1 )

 (1 r2 )

Figure 3: (a) Signal ow graph for lossless tube model (N = 3) of the vocal tract; (b) equivalent discrete-time system; (c) equivalent discrete-time system using only whole delays in ladder part. The denominator polynomial D(z ) in Eq. (0.1) satises the polynomial recursion [2,3] D0 (z ) = 1 Dk (z ) = Dk1 (z ) + rk z k Dk1 (z 1 ) D(z ) = DN (z ) where the rk s in Eq. (0.2) are the reection coecients at the tube junctions, rk = Ak+1 Ak Ak+1 + Ak (0.3) k = 1, 2, . . . , N (0.2)

In deriving the recursion in Eq. (0.2), it was assumed that there were no losses at the glottal end (rG = 1) and that all the losses are introduced at the lip end through the reection coecient r N = rL = AN +1 AN AN +1 + AN 5 (0.4)

where AN +1 is the area of an impedance-matched tube that can be chosen to introduce a loss in the system. Suppose that we have a set of areas for a lossless tube model, and we wish to obtain the system function for the system so that we can use the MATLAB filter( ) function to implement the model; i.e., we want to obtain the system function of Eq. (0.1) in the form V (z ) = G = D(z ) 1
k=1

G
N

(0.5)
k

k z

(Note that we have dropped the delay of N/2 samples, which is inconsequential for use in synthesis and would be impossible to implement when N is odd.) The following MATLAB M-le implements Eqs. (0.2) and (0.3); i.e., it takes an array of tube areas and a reection coecient at the lip end and nds the parameters of Eq. (0.5) along with the reection coecients. function [r,D,G]=AtoV(A,rN) % function to find reflection coefficients % and system function denominator for % lossless tube models. % [r,D,G]=AtoV(A,rN) % rN = reflection coefficient at lips (abs value < 1) % A = array of areas % D = array of denominator coefficients % G = numerator of transfer function % r = corresponding reflection coefficients % assumes no losses at the glottis end (rG=1). [M,N]=size(A); if(M~=1) A=A; end %make row vector N=length(A); r=[]; for m=1:N-1 r=[r (A(m+1)-A(m))/(A(m+1)+A(m))]; end r=[r rN]; D=[1]; G=1; for m=1:N G=G*(1+r(m)); D=[D 0] + r(m).*[0 fliplr(D)]; end As test data for this project, the following area functions were estimated from data obtained by Fant.[1] Section vowel AA vowel IY 1 1.6 2.6 2 2.6 8 3 .65 10.5 4 1.6 10.5 6 5 2.6 8 6 4 4 7 6.5 .65 8 8 .65 9 7 1.3 10 5 3.2

Exercise 4.1:

Frequency Response and Pole/Zero Plot

(a) Use the M-le AtoV( ) to obtain the denominator D(z ) of the vocal tract system function, and make plots of the frequency response for both area functions for rN=.71 and also for the totally lossless case rN = 1. (b) Factor the polynomials D(z ) and plot the poles in the z-plane using zplane( ) or zpl( ). Convert the angles of the roots to analog frequencies corresponding to a sampling rate of 1/T = 10000 samples/sec, and compare to the formant frequencies expected for these vowels.[See Lecture 3] For this sampling rate, what is the eective length of the vocal tract in cm? Exercise 4.2: Finding the Model from the System Function

The inverse problem arises when we want to obtain the areas and reection coecients for a lossless tube model given the system function in the form of Eq. (0.5). We know that the denominator of the system function, D(z ), satises Eqs. (0.2). In this part we will use Eqs. (0.2) to develop an algorithm for nding the reection coecients and the areas of a lossless tube model having a given system function. Parts (a)-(c) below are covered in Problem 2.6 of Problem Set #2. (a) Show that rN is equal to the coecient of z N in the denominator of V (z ); i.e. rN = N . (b) Use Eqs. (0.2) to show that Dk1 (z ) = Dk (z ) rk z k Dk (z 1 ) 2 1 rk k = N, N 1, . . . , 2

(c) How would you nd rk1 from Dk1 (z )? (d) Using the results of parts (a), (b), and (c), state an algorithm for nding all of the reection coecients rk , k = 1, 2, . . . , N and all of the tube areas Ak , k = 1, 2, . . . , N . Are the Ak s unique? Write a MATLAB function to implement your algorithm for converting from D(z ) to reection coecients and areas. This M-le should as dened by the following: function [r,A]=VtoA(D,A1) % function to find reflection coefficients % and tube areas for lossless tube models. % [r,A]=VtoA(D,A1) % A1 = arbitrary area of first section % D = array of denominator coefficients % A = array of areas for lossless tube model % r = corresponding reflection coefficients % assumes no losses at the glottis end (rG=1). For the vowel AA, the denominator of the 10th-order model should be (to 4 digit accuracy) D(z ) = 1 0.0460z 1 0.6232z 2 + 0.3814z 3 + 0.2443z 4 + 0.1973z 5 +0.2873z 6 + 0.3655z 7 0.4806z 8 0.1153z 9 + 0.7100z 10 Use your MATLAB program to nd the corresponding reection coecients and tube areas and compare to the data for the vowel AA in the table above. 7

Vowel Synthesis

Project Description
For voiced speech, the speech model of Fig. 1 can be simplied to the system of Fig. 4. The exitation signal e[n] is a quasi-periodic impulse train and the glottal pulse model could be either the exponential or the Rosenberg pulse. The vocal tract model could be a lattice lter of the form of Fig. 3(c) or it could be an equivalent direct form dierence equation as implemented by MATLABs filter( ) or \verbconv( )+ function.

e[n]
E

Glottal Pulse G(z )

Vocal Tract V (z )

Radiation R(z ) = (1 z 1 )

s[n]
E

Figure 4: Simplied model for synthesizing voiced speech.

Hints
In this project we will use the MATLAB filter( ) and conv( ) functions to implement the system of Fig. 4 and thereby synthesize periodic vowel sounds. Exercise 5.1: Periodic Vowel Synthesis

Assume a sampling rate of 10000 samples/sec. Create a periodic impulse train vector e of length 1000 samples with period corresponding to a fundamental frequency of 100 Hz. Then use either filter( ) or conv( ) to implement the system of Fig. 4. Use the excitation e and radiation system R(z ) = (1 z 1 ) to synthesize speech for both area functions given above, and for all three glottal pulses studied in Project 2. Use subplot( ) and plot( ) to make a plot comparing 1000 samples of the synthetic speech outputs for the exponential glottal pulse and the Rosenberg minimum-phase pulse. Make another plot comparing the outputs for the two Rosenberg pulses. Exercise 5.2: Frequency Response of Vowel Synthesizer

Plot the frequency response of the overall system with system function H (z ) = G(z )V (z )R(z ) for the case of the Rosenberg glottal pulse, R(z ) = (1 z 1 ), and vocal tract response for the vowel IY. Exercise 5.3: Listening to the Output

Create a le of length corresponding to 0.5 sec. duration and play it out through the D/A system using MATLABs soundsc( ) function. Does the synthetic speech sound like the desired vowels?

Report

Submit a typewritten report including appropriate plots and images to illustrate your work. Learn to include graphics in your report either with LaTeX or MS Word, or whatever you use for this sort of thing. You should structure your report along the lines of the sections of this project assignment. Be sure to answer all the specic questions asked above and provide graphs and MATLAB coded wherever appropriate.

You might also like