Proceedings of the Seventh International Workshop on Natural Language Generation - INLG '94, 1994
In this paper, we address the issue of integrating semantic lexicons into NLG systems and argue t... more In this paper, we address the issue of integrating semantic lexicons into NLG systems and argue that the problem of lexical choice in generation can be approached only by such an integration. We take the approach of Generative Lexicon Theory (GLT) (Pnstejovsky, 1991, 1994c) which provides a system involving four levels of representation connected by a set of generative devices accounting for a compositional interpretation of words in context. We are interested in showing that we can reduce the set of collocations listed in the lexicon by introducing the notion of "semantic collofations" which can be predicted within GLT framework. We argue that the lack of semantic welldefined calculi in previous approaches, whether linguistic or conceptual, renders them unable to account for semantic collocations.
Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -, 2000
In this paper we describe the construction of a part-of-speech tagger both for medical document r... more In this paper we describe the construction of a part-of-speech tagger both for medical document retrieval purposes and XP extraction. Therefore we have designed a double system: for retrieval purposes, we rely on a rule-based architecture, called minimal commitment, which is likely to be completed by a data-driven tool (HMM) when full disambiguation is necessary.
We present an original system for locating and removing personally-identifying information in pat... more We present an original system for locating and removing personally-identifying information in patient records. In this experiment, anonymization is seen as a particular case of knowledge extraction. We use natural language processing tools provided by the MEDTAG framework: a semantic lexicon specialized in medicine, and a toolkit for word-sense and morpho-syntactic tagging. The system finds 98-99% of all personally-identifying information.
Studies in health technology and informatics, 2000
In this paper we describe the construction of a part-of-speech tagger for medical document retrie... more In this paper we describe the construction of a part-of-speech tagger for medical document retrieval purposes, therefore we have designed a specific architecture called minimal commitment. The system uses local grammatical rules for conducting the disambiguation task. Four evaluations are conducted, with and without taking unknown words into account. In between each evaluation the modules (lexicon, guesser, rules) of the system are incrementally improved.
Executive summary Spoken Language Translator (SLT) is a project whose long-term goal is the const... more Executive summary Spoken Language Translator (SLT) is a project whose long-term goal is the construction of practically useful systems capable of translating human speech from one language into another. The current SLT prototype, described in detail in this report, is ...
... Pierrette Bouillon*, Cécile Fabre**, Pascale Sébillot***, Laurence Jacqmin**** Résumé - Abstr... more ... Pierrette Bouillon*, Cécile Fabre**, Pascale Sébillot***, Laurence Jacqmin**** Résumé - Abstract ... Fabre (Fabre C. 1996) a montré que les liens NV expri- més dans les qualia permettent de calculer la représentation sémantique des groupes nominaux (cf. ...
Previous studies have shown that pre-editing techniques can handle the extreme variability and un... more Previous studies have shown that pre-editing techniques can handle the extreme variability and uneven quality of user-generated content (UGC), improve its machine-translatability and reduce post-editing time. Nevertheless, it seems important to find out whether real users of online communities, which is the real life scenario targeted by the ACCEPT project, are linguistically competent and willing to pre-edit their texts according to specific pre-editing rules. We report the findings from a user study with real French-speaking forum users who were asked to apply pre-editing rules to forum posts using a specific forum plugin. We analyse the interaction of users with pre-editing rules and evaluate the impact of the users' pre-edited versions on translation, as the ultimate goal of the ACCEPT project is to facilitate sharing of knowledge between different language communities.
We describe a series of experiments in which we start with English to French and English to Japan... more We describe a series of experiments in which we start with English to French and English to Japanese versions of a rule-based speech translation system for a medical domain, and bootstrap corresponding statistical systems. Comparative evaluation reveals that the statistical systems are still slightly inferior to the rule-based ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; however, a hybrid system is able to deliver a small but significant improvement in performance. In conclusion, we suggest that the hybrid architecture we describe potentially allows construction of limited-domain speech translation systems which combine substantial source-language coverage with high-precision translation.
Proceedings of the Workshop on Medical Speech Translation - MST '06, 2006
MedSLT is a unidirectional medical speech translation system intended for use in doctor-patient d... more MedSLT is a unidirectional medical speech translation system intended for use in doctor-patient diagnosis dialogues, which provides coverage of several different language pairs and subdomains. Vocabulary ranges from about 350 to 1000 surface words, depending on the language and subdomain. We will demo both the system itself and the development environment, which uses a combination of rule-based and data-driven methods to construct efficient recognisers, generators and transfer rule sets from small corpora.
• Dans cet article, nous présentons une expérience concrète d'intégration du logiciel d'apprentis... more • Dans cet article, nous présentons une expérience concrète d'intégration du logiciel d'apprentissage des langues CALL-SLT, fondé sur la reconnaissance vocale, à l'université de Bologne (Rayner et al., 2010a, 2010b ; Bouillon et al., 2011). Nous montrons dans quelle mesure la satisfaction des étudiants est corrélée à une réelle amélioration des connaissances linguistiques. Nous présentons d'abord le système CALL-SLT. Ensuite, nous décrivons l'expérience pilote qui a été menée à l'Université de Bologne avec des étudiants italophones, apprenant le français, pour évaluer qualitativement et quantitativement l'apport de CALL-SLT pour l'apprentissage des langues.
We describe a prototype platform for creating multilingual voice questionnaires. Content is defin... more We describe a prototype platform for creating multilingual voice questionnaires. Content is defined using a simple form-based language with units for questions, question-groups and answers; questionnaire definitions are compiled into efficient speech recognition packages and tables, and the resulting applications can be deployed over the web on both desktop and mobile platforms. We sketch our initial questionnaire application, which is designed for gathering information related to availability of anti-malaria measures in sub-Saharan Africa. It contains 114 question-groups and 218 questions.
Proceedings of the Seventh International Workshop on Natural Language Generation - INLG '94, 1994
In this paper, we address the issue of integrating semantic lexicons into NLG systems and argue t... more In this paper, we address the issue of integrating semantic lexicons into NLG systems and argue that the problem of lexical choice in generation can be approached only by such an integration. We take the approach of Generative Lexicon Theory (GLT) (Pnstejovsky, 1991, 1994c) which provides a system involving four levels of representation connected by a set of generative devices accounting for a compositional interpretation of words in context. We are interested in showing that we can reduce the set of collocations listed in the lexicon by introducing the notion of "semantic collofations" which can be predicted within GLT framework. We argue that the lack of semantic welldefined calculi in previous approaches, whether linguistic or conceptual, renders them unable to account for semantic collocations.
Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -, 2000
In this paper we describe the construction of a part-of-speech tagger both for medical document r... more In this paper we describe the construction of a part-of-speech tagger both for medical document retrieval purposes and XP extraction. Therefore we have designed a double system: for retrieval purposes, we rely on a rule-based architecture, called minimal commitment, which is likely to be completed by a data-driven tool (HMM) when full disambiguation is necessary.
We present an original system for locating and removing personally-identifying information in pat... more We present an original system for locating and removing personally-identifying information in patient records. In this experiment, anonymization is seen as a particular case of knowledge extraction. We use natural language processing tools provided by the MEDTAG framework: a semantic lexicon specialized in medicine, and a toolkit for word-sense and morpho-syntactic tagging. The system finds 98-99% of all personally-identifying information.
Studies in health technology and informatics, 2000
In this paper we describe the construction of a part-of-speech tagger for medical document retrie... more In this paper we describe the construction of a part-of-speech tagger for medical document retrieval purposes, therefore we have designed a specific architecture called minimal commitment. The system uses local grammatical rules for conducting the disambiguation task. Four evaluations are conducted, with and without taking unknown words into account. In between each evaluation the modules (lexicon, guesser, rules) of the system are incrementally improved.
Executive summary Spoken Language Translator (SLT) is a project whose long-term goal is the const... more Executive summary Spoken Language Translator (SLT) is a project whose long-term goal is the construction of practically useful systems capable of translating human speech from one language into another. The current SLT prototype, described in detail in this report, is ...
... Pierrette Bouillon*, Cécile Fabre**, Pascale Sébillot***, Laurence Jacqmin**** Résumé - Abstr... more ... Pierrette Bouillon*, Cécile Fabre**, Pascale Sébillot***, Laurence Jacqmin**** Résumé - Abstract ... Fabre (Fabre C. 1996) a montré que les liens NV expri- més dans les qualia permettent de calculer la représentation sémantique des groupes nominaux (cf. ...
Previous studies have shown that pre-editing techniques can handle the extreme variability and un... more Previous studies have shown that pre-editing techniques can handle the extreme variability and uneven quality of user-generated content (UGC), improve its machine-translatability and reduce post-editing time. Nevertheless, it seems important to find out whether real users of online communities, which is the real life scenario targeted by the ACCEPT project, are linguistically competent and willing to pre-edit their texts according to specific pre-editing rules. We report the findings from a user study with real French-speaking forum users who were asked to apply pre-editing rules to forum posts using a specific forum plugin. We analyse the interaction of users with pre-editing rules and evaluate the impact of the users' pre-edited versions on translation, as the ultimate goal of the ACCEPT project is to facilitate sharing of knowledge between different language communities.
We describe a series of experiments in which we start with English to French and English to Japan... more We describe a series of experiments in which we start with English to French and English to Japanese versions of a rule-based speech translation system for a medical domain, and bootstrap corresponding statistical systems. Comparative evaluation reveals that the statistical systems are still slightly inferior to the rule-based ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; however, a hybrid system is able to deliver a small but significant improvement in performance. In conclusion, we suggest that the hybrid architecture we describe potentially allows construction of limited-domain speech translation systems which combine substantial source-language coverage with high-precision translation.
Proceedings of the Workshop on Medical Speech Translation - MST '06, 2006
MedSLT is a unidirectional medical speech translation system intended for use in doctor-patient d... more MedSLT is a unidirectional medical speech translation system intended for use in doctor-patient diagnosis dialogues, which provides coverage of several different language pairs and subdomains. Vocabulary ranges from about 350 to 1000 surface words, depending on the language and subdomain. We will demo both the system itself and the development environment, which uses a combination of rule-based and data-driven methods to construct efficient recognisers, generators and transfer rule sets from small corpora.
• Dans cet article, nous présentons une expérience concrète d'intégration du logiciel d'apprentis... more • Dans cet article, nous présentons une expérience concrète d'intégration du logiciel d'apprentissage des langues CALL-SLT, fondé sur la reconnaissance vocale, à l'université de Bologne (Rayner et al., 2010a, 2010b ; Bouillon et al., 2011). Nous montrons dans quelle mesure la satisfaction des étudiants est corrélée à une réelle amélioration des connaissances linguistiques. Nous présentons d'abord le système CALL-SLT. Ensuite, nous décrivons l'expérience pilote qui a été menée à l'Université de Bologne avec des étudiants italophones, apprenant le français, pour évaluer qualitativement et quantitativement l'apport de CALL-SLT pour l'apprentissage des langues.
We describe a prototype platform for creating multilingual voice questionnaires. Content is defin... more We describe a prototype platform for creating multilingual voice questionnaires. Content is defined using a simple form-based language with units for questions, question-groups and answers; questionnaire definitions are compiled into efficient speech recognition packages and tables, and the resulting applications can be deployed over the web on both desktop and mobile platforms. We sketch our initial questionnaire application, which is designed for gathering information related to availability of anti-malaria measures in sub-Saharan Africa. It contains 114 question-groups and 218 questions.
Uploads
Papers by P. Bouillon