In this paper we propose an automatic approach to foreign accent identification. Knowing the spea... more In this paper we propose an automatic approach to foreign accent identification. Knowing the speaker ori- gin could allow to adapt the acoustic models for non- native speech recognition. In this study, we use a sta- tistical approach based on prosodic parameters. This approach relies on the fact that prosody is dierent between languages, and so between accents. This work
In this article, we present an approach for non native automatic speech recognition (ASR). We pro... more In this article, we present an approach for non native automatic speech recognition (ASR). We propose two methods to adapt existing ASR systems to the non-native accents. The first method is based on the modification of acoustic models through integration of acoustic models from the mother tong. The phonemes of the target language are pronounced in a similar manner to the native language of speakers. We propose to combine the models of confused phonemes so that the ASR system could recognize both concurrent pronounciations. The second method we propose is a refinment of the pronounciation error detection through the introduction of graphemic constraints. Indeed, non native speakers may rely on the writing of words in their uttering. Thus, the pronounctiation errors might depend on the characters composing the words. The average error rate reduction that we observed is (22.5%) relative for the sentence error rate, and 34.5% (relative) in word error rate.
Interspeech 2005 Eurospeech 9th European Conference on Speech Communication and Technology, 2005
This paper presents a fully automated approach for the recognition of non-native speech based on ... more This paper presents a fully automated approach for the recognition of non-native speech based on acoustic model modification. For a native language (L1) and a spoken language (L2), pronunciation variants of the phones of L2 are automatically extracted from an existing non-native database as a confusion matrix with sequences of phones of L1. This is done using L1's and L2's ASR systems. This confusion concept deals with the problem of non existence of match between some L2 and L1 phones. The confusion matrix is then used to modify the acoustic models (HMMs) of L2 phones by integrating corresponding L1 phone models as alternative HMM paths. In this way, no lexicon modification is carried. The modified ASR system achieved an improvement between 32% and 40% (relative, L1=French and L2=English) in WER on the French non-native database used for testing.
Computer Speech Language Computer Speech and Language, Apr 1, 2010
This paper addresses the problem of parameterization for speech/music discrimination. The current... more This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.
Proceeding of Fourth International Conference on Spoken Language Processing Icslp 96, Oct 3, 1996
In this paper, several techniques for reducing the search complexity of beam search for continuou... more In this paper, several techniques for reducing the search complexity of beam search for continuous speech recognition task are proposed. Six heuristic methods for pruning are described and the parameters of the pruning are adjusted to keep constant the word error rate while reducing the computational complexity and memory demand. The evaluation of the effect of each pruning method is performed in Mixture Stochastic Trajectory Model (MSTM). MSTM is a segment-based model using phonemes as the speech units. The set of tests in a speaker-dependent continuous speech recognition task shows that using the pruning methods, a substantial reduction of 67% of search effort is obtained in term of number of hypothesised phonemes during the search. All proposed techniques are independent of the acoustic models and therefore are applicable to other acoustic modeling techniques.
Annual Conference of the International Speech Communication Association, 2003
This paper proposes two new approaches to rapid speaker adaptation of acoustic models by using ge... more This paper proposes two new approaches to rapid speaker adaptation of acoustic models by using genetic algorithms. Whereas conventional speaker adaptation techniques yield adapted models which represent local optimum solutions, genetic algorithms are capable to provide multiple optimal solutions, thereby delivering potentially more robust adapted models. We have investigated two different strategies of application of the genetic algorithm in the framework of speaker adaptation of acoustic models. The first approach ( ) consists in using a genetic algorithm to adapt the set of Gaussian means to a new speaker. The second approach ( · Î ) uses the genetic algorithm to enrich the set of speaker-dependant systems employed by the EigenVoices. Experiments with the Resource Management corpus show that, with one adaptation utterance, GA can improve the performances of a speaker-independant system as efficiently as EigenVoices. The method · Î outperforms EigenVoices.
The problem of Speech/Music discrimination is a challen- ging research problem which signicantly ... more The problem of Speech/Music discrimination is a challen- ging research problem which signicantly impacts Auto- matic Speech Recognition (ASR) performance. This pa- per proposes new features for the Speech/Music discrimi- nation task. We use a decomposition of the audio signal based on wavelets which allows a good analysis of non stationary signals like speech or music. We compute dif- ferent
In this paper we present an automated method for the classification of the origin of non-native s... more In this paper we present an automated method for the classification of the origin of non-native speakers. The origin of non-native speakers could be identified by a hu- man listener based on the detection of typical pronuncia- tions for each nationality. Thus we suppose the existence of several phoneme sequences that might allow the clas- sification of the origin of
In the framework of ESTER, the recent French broadcast radio news transcription task evaluation, ... more In the framework of ESTER, the recent French broadcast radio news transcription task evaluation, we have developed the first version of ANTS, the Automatic News Transcription System of LORIA. This paper describes the different components of the ANTS system and provides some first recognition results on the ESTER database. Then it presents several experiments carried out on this system to take into account the specificities of the French language: how accurate should the phones models be and how to deal with the problem of the liaisons between words.
Annual Conference of the International Speech Communication Association, 2006
... E. Didiot, I. Illina, O. Mella, D. Fohr, J.-P. Haton LORIA-CNRS & INRIA Lorraine BP 239, ... more ... E. Didiot, I. Illina, O. Mella, D. Fohr, J.-P. Haton LORIA-CNRS & INRIA Lorraine BP 239, 54506 Vandoeuvre-les-Nancy, France {didiot,illina,mella,fohr,jph}@loria.fr Abstract ... fj = log 10 0 @ 1 Nj Nj X k=1 (wj k )2 1 A Logarithm of Teager energy (TE). ...
In this paper, we present several adaptation methods for non- native speech recognition. We have ... more In this paper, we present several adaptation methods for non- native speech recognition. We have tested pronunciation mod- elling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The "phonetic confusion" scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In our experiments, we
Annual Conference of the International Speech Communication Association, 2004
This paper presents the recent development of ANTS, the Automatic News Transcription System of LO... more This paper presents the recent development of ANTS, the Automatic News Transcription System of LORIA. This system was designed in the framework of ESTER, the French broadcast radio news transcription task evaluation. After describing its different components and some segmentation and recognition results on the ESTER database, we present a number of experiments focusing on the real-time version of ANTS.
In this paper we propose an automatic approach to foreign accent identification. Knowing the spea... more In this paper we propose an automatic approach to foreign accent identification. Knowing the speaker ori- gin could allow to adapt the acoustic models for non- native speech recognition. In this study, we use a sta- tistical approach based on prosodic parameters. This approach relies on the fact that prosody is dierent between languages, and so between accents. This work
In this article, we present an approach for non native automatic speech recognition (ASR). We pro... more In this article, we present an approach for non native automatic speech recognition (ASR). We propose two methods to adapt existing ASR systems to the non-native accents. The first method is based on the modification of acoustic models through integration of acoustic models from the mother tong. The phonemes of the target language are pronounced in a similar manner to the native language of speakers. We propose to combine the models of confused phonemes so that the ASR system could recognize both concurrent pronounciations. The second method we propose is a refinment of the pronounciation error detection through the introduction of graphemic constraints. Indeed, non native speakers may rely on the writing of words in their uttering. Thus, the pronounctiation errors might depend on the characters composing the words. The average error rate reduction that we observed is (22.5%) relative for the sentence error rate, and 34.5% (relative) in word error rate.
Interspeech 2005 Eurospeech 9th European Conference on Speech Communication and Technology, 2005
This paper presents a fully automated approach for the recognition of non-native speech based on ... more This paper presents a fully automated approach for the recognition of non-native speech based on acoustic model modification. For a native language (L1) and a spoken language (L2), pronunciation variants of the phones of L2 are automatically extracted from an existing non-native database as a confusion matrix with sequences of phones of L1. This is done using L1's and L2's ASR systems. This confusion concept deals with the problem of non existence of match between some L2 and L1 phones. The confusion matrix is then used to modify the acoustic models (HMMs) of L2 phones by integrating corresponding L1 phone models as alternative HMM paths. In this way, no lexicon modification is carried. The modified ASR system achieved an improvement between 32% and 40% (relative, L1=French and L2=English) in WER on the French non-native database used for testing.
Computer Speech Language Computer Speech and Language, Apr 1, 2010
This paper addresses the problem of parameterization for speech/music discrimination. The current... more This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.
Proceeding of Fourth International Conference on Spoken Language Processing Icslp 96, Oct 3, 1996
In this paper, several techniques for reducing the search complexity of beam search for continuou... more In this paper, several techniques for reducing the search complexity of beam search for continuous speech recognition task are proposed. Six heuristic methods for pruning are described and the parameters of the pruning are adjusted to keep constant the word error rate while reducing the computational complexity and memory demand. The evaluation of the effect of each pruning method is performed in Mixture Stochastic Trajectory Model (MSTM). MSTM is a segment-based model using phonemes as the speech units. The set of tests in a speaker-dependent continuous speech recognition task shows that using the pruning methods, a substantial reduction of 67% of search effort is obtained in term of number of hypothesised phonemes during the search. All proposed techniques are independent of the acoustic models and therefore are applicable to other acoustic modeling techniques.
Annual Conference of the International Speech Communication Association, 2003
This paper proposes two new approaches to rapid speaker adaptation of acoustic models by using ge... more This paper proposes two new approaches to rapid speaker adaptation of acoustic models by using genetic algorithms. Whereas conventional speaker adaptation techniques yield adapted models which represent local optimum solutions, genetic algorithms are capable to provide multiple optimal solutions, thereby delivering potentially more robust adapted models. We have investigated two different strategies of application of the genetic algorithm in the framework of speaker adaptation of acoustic models. The first approach ( ) consists in using a genetic algorithm to adapt the set of Gaussian means to a new speaker. The second approach ( · Î ) uses the genetic algorithm to enrich the set of speaker-dependant systems employed by the EigenVoices. Experiments with the Resource Management corpus show that, with one adaptation utterance, GA can improve the performances of a speaker-independant system as efficiently as EigenVoices. The method · Î outperforms EigenVoices.
The problem of Speech/Music discrimination is a challen- ging research problem which signicantly ... more The problem of Speech/Music discrimination is a challen- ging research problem which signicantly impacts Auto- matic Speech Recognition (ASR) performance. This pa- per proposes new features for the Speech/Music discrimi- nation task. We use a decomposition of the audio signal based on wavelets which allows a good analysis of non stationary signals like speech or music. We compute dif- ferent
In this paper we present an automated method for the classification of the origin of non-native s... more In this paper we present an automated method for the classification of the origin of non-native speakers. The origin of non-native speakers could be identified by a hu- man listener based on the detection of typical pronuncia- tions for each nationality. Thus we suppose the existence of several phoneme sequences that might allow the clas- sification of the origin of
In the framework of ESTER, the recent French broadcast radio news transcription task evaluation, ... more In the framework of ESTER, the recent French broadcast radio news transcription task evaluation, we have developed the first version of ANTS, the Automatic News Transcription System of LORIA. This paper describes the different components of the ANTS system and provides some first recognition results on the ESTER database. Then it presents several experiments carried out on this system to take into account the specificities of the French language: how accurate should the phones models be and how to deal with the problem of the liaisons between words.
Annual Conference of the International Speech Communication Association, 2006
... E. Didiot, I. Illina, O. Mella, D. Fohr, J.-P. Haton LORIA-CNRS & INRIA Lorraine BP 239, ... more ... E. Didiot, I. Illina, O. Mella, D. Fohr, J.-P. Haton LORIA-CNRS & INRIA Lorraine BP 239, 54506 Vandoeuvre-les-Nancy, France {didiot,illina,mella,fohr,jph}@loria.fr Abstract ... fj = log 10 0 @ 1 Nj Nj X k=1 (wj k )2 1 A Logarithm of Teager energy (TE). ...
In this paper, we present several adaptation methods for non- native speech recognition. We have ... more In this paper, we present several adaptation methods for non- native speech recognition. We have tested pronunciation mod- elling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The "phonetic confusion" scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In our experiments, we
Annual Conference of the International Speech Communication Association, 2004
This paper presents the recent development of ANTS, the Automatic News Transcription System of LO... more This paper presents the recent development of ANTS, the Automatic News Transcription System of LORIA. This system was designed in the framework of ESTER, the French broadcast radio news transcription task evaluation. After describing its different components and some segmentation and recognition results on the ESTER database, we present a number of experiments focusing on the real-time version of ANTS.
Uploads
Papers by Irina Illina