Papers by Corinne Fredouille
This paper is concerned with the speaker diarization task in the specific context of the meeting ... more This paper is concerned with the speaker diarization task in the specific context of the meeting room recordings. Firstly, different technical improvements of an E-HMM based system are proposed and evaluated in the framework of the NIST RT’06S evaluation campaign. Related experiments show an absolute gain of 6.4% overall speaker diarization error rate (DER) and 12.9% on the development and evaluation corpora respectively. Secondly, this paper presents an original strategy to deal with the overlapping speech. Indeed, speech overlaps between speakers are largely involved in meetings due to the spontaneous nature of this kind of data and they are responsible for a decrease in performance of the speaker diarization system, if they are not dealt with. Experiments still conducted in the framework of the NIST RT’06S evaluation show the ability of the strategy in detecting overlapping speech (decrease of the missed speaker error rate), even if an overall gain in speaker diarization performance has not been achieved yet.
This paper is concerned with the speaker diarization task in the specific context of the meeting ... more This paper is concerned with the speaker diarization task in the specific context of the meeting room recordings. Firstly, different technical improvements of an E-HMM based system are proposed and evaluated in the framework of the NIST RT’06S evaluation campaign. Related experiments show an absolute gain of 6.4% overall speaker diarization error rate (DER) and 12.9% on the development and evaluation corpora respectively. Secondly, this paper presents an original strategy to deal with the overlapping speech. Indeed, speech overlaps between speakers are largely involved in meetings due to the spontaneous nature of this kind of data and they are responsible for a decrease in performance of the speaker diarization system, if they are not dealt with. Experiments still conducted in the framework of the NIST RT’06S evaluation show the ability of the strategy in detecting overlapping speech (decrease of the missed speaker error rate), even if an overall gain in speaker diarization performance has not been achieved yet.
... This data set is made up of 230 male and 309 female speakers. The train-ing material consists... more ... This data set is made up of 230 male and 309 female speakers. The train-ing material consists of two minutes of speech recorded over two sessions. ... Experiments should be conducted on multi-speaker tracking, where this system should perform quite well. ...
This paper summarizes the collaboration of LIA and CLIPS laboratories, members of the ELISA conso... more This paper summarizes the collaboration of LIA and CLIPS laboratories, members of the ELISA consortium, along the last 4 year NIST speaker diarization system evaluation campaigns. In this context, two individual approaches, quite different, have been developed individually by each lab, to respond to the specific task of speaker segmentation. The first one relies on a classical two-step speaker segmentation strategy, based on the detection of speaker turns followed by a clustering process, while the second one corresponds to an integrated strategy where both segment boundaries and speaker tying of the segments are extracted simultaneously and challenged during the whole process. From these two main methods, various strategies were investigated for the fusion of segmentation results. Through the performance achieved along the different evaluation campaigns as well as the experience gained by LIA and CLIPS labs in speaker diarization task, a discussion about the overall work done in this evaluation context is drawn in this paper, proposing further investigation and progression.
Abstract: For the task of speaker verification, similarity measurenormalization methods are relev... more Abstract: For the task of speaker verification, similarity measurenormalization methods are relevant to cope withvariability problems and with data and/or decision fusionissues. The aim of this paper is to suggest a newnormalization method, which combines classical worldmodel-based normalization techniques with a posterioriprobability-based ones. This method presents the wellknownadvantages of the a posteriori probability-basedmethods without requiring data and speaker specificprocessing. Here,...
... However, transmit speech via TCP connections is not very realistic due to real-time needs in ... more ... However, transmit speech via TCP connections is not very realistic due to real-time needs in most of the ... ie we initiated a transatlantic connection between France and Mexico with avideoconferencing software but we ... [7] DA Reynolds, Speaker identification and verification ...
Speech Communication, 2000
Statistical modeling of the speech signal has been widely used in speaker recognition. The perfor... more Statistical modeling of the speech signal has been widely used in speaker recognition. The performance obtained with this type of modeling is excellent in laboratories but decreases dramatically for telephone or noisy speech. Moreover, it is difficult to know which piece of information is taken into account by the system. In order to solve this problem and to improve
This paper presents the ELISA consortium activities in automatic speaker segmentation also known ... more This paper presents the ELISA consortium activities in automatic speaker segmentation also known as speaker diarization during the NIST Rich Transcription (RT) 2003 evaluation. The experiments were conducted on real broadcast news data (HUB4). Two different approaches from CLIPS and LIA laboratories are presented and different possibilities of combining them are investigated, in the framework of ELISA consortium. The system submitted as ELISA primary system obtained the second lower segmentation error rate compared to the other RT03-participant primary systems. Another ELISA system submitted as secondary system outperformed the best primary system and obtained the lowest speaker segmentation error rate.
Concerned with pathological voice assessment, this paper aims at characterizing dysphonia in the ... more Concerned with pathological voice assessment, this paper aims at characterizing dysphonia in the frequency domain for a better understanding of relating phenomena while most of the studies have focused only on improving classification systems for diagnosis help purposes. In this context, a GMM-based automatic classification system is applied on different frequency ranges in order to investigate which ones are relevant for dysphonia characterization. Experiment results demonstrate that the low frequencies [0-3000]Hz are more relevant for dysphonia discrimination compared with higher frequencies.
This paper investigates the effect of voice transformation on automatic speaker recognition syste... more This paper investigates the effect of voice transformation on automatic speaker recognition system performance. We focus on increasing the impostor acceptance rate, by modifying the voice of an impostor in order to target a specific speaker. This paper is based on the following idea: in several applications and particularly in forensic situations, it is reasonable to think that some organizations have a knowledge on the speaker recognition method used and could impersonate a given, well known speaker. This paper presents some experiments based on NIST SRE 2005 protocol and a simple impostor voice transformation method. The results show that this simple voice transformation allows a drastic increase of the false acceptance rate, without a degradation of the natural aspect of the voice
Classical adaptation approaches are generally used for speaker or environment adaptation of speec... more Classical adaptation approaches are generally used for speaker or environment adaptation of speech recognition systems. In this paper, we use such techniques for the incremental training of client models in a speaker verification system. The initial model is trained on a very limited amount of data and then progressively updated with access data, using a segmental-EM procedure. In supervised mode (i.e. when access utterances are certified), the incremental approach yields equivalent performance to the batch one. We also investigate on the impact of various scenarios of impostor attacks during the incremental enrollment phase. All results are obtained with the Picassoft platform-the state-of-the-art speaker verification system developed in the PICASSO project
The paper investigates the interest of segmentation in acoustic macro classes (like gender or ban... more The paper investigates the interest of segmentation in acoustic macro classes (like gender or bandwidth) as front-end processing for the segmentation/diarization task. The impact of this prior acoustic segmentation is evaluated in terms of speaker diarization performance in the particular context of NIST RT'03 evaluation (done on the HUB4 broadcast news corpora). It is rarely discussed in the literature, but our work shows that the application of prior acoustic segmentation, in a similar way to the automatic speech recognition task, may be very useful to the speaker segmentation task. Experiments were conducted using two different kinds of speaker segmentation systems developed individually by the LIA and CLIPS laboratories in the framework of the ELISA consortium. For both systems, improvement was observed when combined with prior acoustic segmentation. However, a larger impact, in terms of performance, is observed on the LIA system based on an ascending/HMM approach compared to the CLIPS system based on speaker turn detection.
This paper investigates the class of information relevant for the task of automatic classificatio... more This paper investigates the class of information relevant for the task of automatic classification of pathological voices. By using a GMM-based classification system (derived from the Automatic Speaker Recognition domain), the focus was made on three main classes of information : energetic, voiced, and phonetic information. Experiments made on a pathological corpus (dysphonia) have shown that phonetic information is particularly interesting in this context since it permits to refine the selection of the relevant information by looking at phonem-or phonem classlevel (e.g. nasal vowels).
Eurasip Journal on Advances in Signal Processing, 2004
This paper presents an overview of a state-of-the-art text-independent speaker verification syste... more This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
Seeking within a speech sequence the speaker utter-ances is one of the main tasks of indexing. In... more Seeking within a speech sequence the speaker utter-ances is one of the main tasks of indexing. In this paper, the proposed speaker tracking sys-tem is defined in the case where all speaker identities are known beforehand. The conversation is modeled as an evolutive ...
This paper presents the speaker tracking system of the LIA laboratory, validated during ESTER 200... more This paper presents the speaker tracking system of the LIA laboratory, validated during ESTER 2005 campaign on a radio broadcast news corpus of about 90 h. The LIA speaker tracking system firstly uses an acoustic class segmentation in order to suppress non speech frames and to detect the speech conditions. Secondly, a speaker diarization process is applied in order to provide speaker detection system (the last step) with speaker homogeneous segments (boundaries and clustering). The speaker detection system uses UBM/GMM likelihood ratios in order to decide if a segment belongs to one tracked speaker. The speaker tracking system is presented and some results obtained during ESTER 2005 campaign are proposed. The presented systems are based on the ALIZE platform (Automatic speaker recognition C++ library).
Digital Signal Processing, 2000
... Keywords: Multirecognizer architecture; block-segmental approach; frequency domain; dynamic i... more ... Keywords: Multirecognizer architecture; block-segmental approach; frequency domain; dynamic information; automatic speaker verification; MAP normalization. ... D. Kryze and CJ Wellekens, Detection of speaker changes in an audio document, Eurospeech'99, Budapest (1999) p ...
This paper investigates the adaptation of Automatic Speaker Recognition (ASR) techniques to the p... more This paper investigates the adaptation of Automatic Speaker Recognition (ASR) techniques to the pathological voice assessment (dysphonic voices). The aim of this study is to provide a novel method, suitable for keeping track of the evolution of the patient's pathology: easy-to-use, fast, non-invasive for the patient, and affordable for the clinicians. This method will be complementary to the existing ones -the perceptual judgment and the usual objective measurement (jitter, airflows...) which remain time and human resource consuming. The system designed for this particular task relies on the GMMbased approach, which is the state-of-the-art for speaker recognition. It is derived from the open source ASR tools (LIA_Spk-Det and ALIZE) of the LIA lab. Experiments conducted on a dysphonic corpus provide promising results, underlining the interest of such an approach and opening further research investigation.
Uploads
Papers by Corinne Fredouille