Papers by Violaine Prince
This paper deals with an application allowing the automatic titling of texts. This one consists o... more This paper deals with an application allowing the automatic titling of texts. This one consists of four stages: Corpus acquisition, candidate sentence determination for the titling, extraction of noun phrases among the candidate sentences, and finally the choice of the title.
2009 International Multiconference on Computer Science and Information Technology, 2009
Among the present crucial issues in UML Modeling, one of the most common is about the fusion of s... more Among the present crucial issues in UML Modeling, one of the most common is about the fusion of similar models coming from various sources. Several similar models are created in Software Engineering and it is of primary interest to compare them and, when possible, to craft a general model including a specific one, or just identify models that are in fact equivalent. Most present approaches are based on model structure comparison and alignment on strings for attributes and classe names. This contribution evaluates the added value of several combined NLP techniques based on lexical networks, POS tagging, and Dependency Rules application, and how they might improve the fusion of models. Topics: use of NLP techniques in practical applications.
2010 IEEE 18th International Conference on Program Comprehension, 2010
Softwares are designed to be used a significant amount of time, therefore maintenance represents ... more Softwares are designed to be used a significant amount of time, therefore maintenance represents an important part of their life cycle. It has been estimated that a lot of the time allocated to software maintenance is spent on the program comprehension. Many approaches using the program structure or external documentation have been created to ease the program comprehension. However, another important source of information is still not widely used for this purpose: the identifiers. In this article, we propose an approach, based on Natural Language Processing techniques, that automatically extracts and organizes concepts from software identifiers in a WordNet-like structure: lexical views. Those lexical views give useful insight on an overall software architecture and can be used to improve results of many software engineering tasks. The proposal is validated on a corpus of 24 open source softwares.
Lingvisticæ Investigationes Supplementa, 1994
Icsoft, 2006
In this paper, we describle a model that relies on the following assumption; ontology negotiation... more In this paper, we describle a model that relies on the following assumption; ontology negotiation and creation is necessary to make knowledge sharing and KM successful through communication. We mostly focus on the modifying process, i.e. dialogue, and we show a dynamic modification of agents knowledge bases could occur through messages exchanges, messages being knowledge chunks to be mapped with agents KB. Dialogue takes account of both success and failure in mapping. We show that the same process helps repair its own anomalies. We describe an architecture for agents knowledge exchange through dialogue. Last we conclude about the benefits of introducing dialogue features in knowledge management.
Applied Intelligence, 1997
This paper presents a lexical model dedicated to the semanticrepresentation and interpretation of... more This paper presents a lexical model dedicated to the semanticrepresentation and interpretation of individual words inunrestricted text, where sense discrimination is difficult toassess. We discuss the need of a lexicon including local inferencemechanisms and cooperating with as many other knowledge sources(about syntax, semantics and pragmatics) as possible. We suggest a’minimal‘ representation (that is, the smallest representationpossible) acting as a bridge
International Journal of Speech Technology, 2008
This paper propose a topical text segmentation method based on intended boundaries detection and ... more This paper propose a topical text segmentation method based on intended boundaries detection and compare it to a well known default boundaries detection method, c99. We ran the two methods on a corpus of twenty two French political discourses and results showed us that intended boundaries detection is better than default boundaries detection on well structured text.
Automatic titling of text is a task allowing to determine a well formed word group able to repres... more Automatic titling of text is a task allowing to determine a well formed word group able to represent the text in a relevant way. The main difficulty of this task is to determine a title having morpho-syntactic characteristics close to titles written by concerned people. Our approach has to be relevant for all type of text (e.g. news, emails, fora, and so forth). Our automatic titling method is developed in four stages: Corpus acquisition, candidate sentences determination for titling, noun phrase extraction in the candidate sentences, and finally, selecting a particular noun phrase to play the role of the text title (ChTITRES approach). Evaluation shows that titles determined by our methods are relevant.
Http Irevues Inist Fr Tralogy, 2011
Natural Language Understanding and Cognitive Science, 2005
We propose an automated text summarization through sentence compression. Our approach uses consti... more We propose an automated text summarization through sentence compression. Our approach uses constituent syntactic function and position in the sentence syntactic tree. We first define the idea of a constituent as well as its role as an information provider, before analyzing contents and discourse consistency losses caused by deleting such a constituent. We explain why our method works best with narrative texts. With a rule-based system using SYGFRAN's morpho-syntactic analysis for French [Cha84], we select removable constituents. Our results are satisfactory at the sentence level but less effective at the whole text level, a situation we explain by describing the difference of impact between constituents and relations.
International Journal of Intelligent Information Technologies, 2007
Lrec, 2008
This paper describes a solution to lexical transfer as a trade-off between a dictionary and an on... more This paper describes a solution to lexical transfer as a trade-off between a dictionary and an ontology. It shows its association to a translation tool based on morpho-syntactical parsing of the source language. It is based on the English Roget Thesaurus and its equivalent, the French Larousse Thesaurus, in a computational framework. Both thesaurii are transformed into vector spaces, and all monolingual entries are represented as vectors, with 1000 components for English and 873 for French. The indexing concepts of the respective thesaurii are the generation families of the vector spaces. A bilingual data structure transforms French entries into vectors in the English space, by using their equivalencies representations. Word sense disambiguation consists in choosing the appropriate vector among these 'bilingual' vectors, by computing the contextualized vector of a given word in its source sentence, wading it in the English vector space, and computing the closest distance to the different entries in the bilingual data structure beginning with the same source string (i.e. French word). The process has been experimented on a 20, 000 words extract of a French novel, Le Petit Prince, and lexical transfer results were found quite encouraging with a recall of 86% and a precision of 71%.
Nous proposons dans cet article un système facilitant la recherche d'information dans un ensemble... more Nous proposons dans cet article un système facilitant la recherche d'information dans un ensemble de documents textuels, basé sur le titrage (et sous-titrage) automatique. Ce dernier peut se révéler crucial, par exemple, dans le cadre de la problématique de l'accessibilité des pages web (norme W3C). Notre processus de titrage automatique consiste à extraire des syntagmes nominaux pertinents dans les textes, pouvant constituer des titres ou sous-titres candidats. Une approche originale combinant des critères statistiques et de placement des mots dans le texte permet alors de proposer des titres et sous-titres pertinents à un utilisateur sous forme de sommaire. L'utilisateur peut donc facilement prendre connaissance de l'ensemble des sujets évoqués dans une masse de documents, et aisément retrouver le document l'intéressant le cas échéant. Une évaluation sur des données réelles montre que les solutions fournies par notre approche de titrage automatique se révèlent tout à fait pertinentes.
Lecture Notes in Computer Science, 2000
Information retrieval needs to match relevant texts with a given query. Selecting appropriate par... more Information retrieval needs to match relevant texts with a given query. Selecting appropriate parts is useful when documents are long, and only portions are interesting to the user. In this paper, we describe a method that extensively uses natural language techniques for text segmentation based on topic change detection. The method requires a NLP-parser and a semantic representation in Roget-based vectors. We have run the experiment on French documents, for which we have the appropriate tools, but the method could be transposed to any other language with the same requirements. The article sketches an overview of the NL understanding environment functionalities, and the algorithms related to our text segmentation method. An experiment in text segmentation is also presented and its result in an information retrieval task is shown.
2008 International Multiconference on Computer Science and Information Technology, 2008
In this paper, we try to fathom the real impact of corpus quality on methods performances and the... more In this paper, we try to fathom the real impact of corpus quality on methods performances and their evaluations. The considered task is topic-based text segmentation, and two highly different unsupervised algorithms are compared: C99, a word-based system, augmented with LSA, and T ranseg, a sentence-based system. Two main characteristics of corpora have been investigated: Data quality (clean vs raw corpora), corpora manipulation (natural vs artificial data sets). The corpus size has also been subject to variation, and experiments related in this paper have shown that corpora characteristics highly impact recall and precision values for both algorithms.
In the framework of research in meaning representations in NLP, we focus our attention on themati... more In the framework of research in meaning representations in NLP, we focus our attention on thematic aspects and conceptual vectors. The learning strategy associated with conceptual vectors relies on the morphosyntaxic analysis of human usage dictionary definitions linked to vector propagation. This analysis currently doesn't take into account negation phenomena. This work aims at studying the antonymy aspects of negation, in the larger goal of its integration into the thematic analysis. After a linguistic presentation of the antonymy, we present a model based on the idea of symmetry compatible with conceptual vectors. Then, we define antonymy functions which allow the construction of an antonymous vector and the enumeration of potantially antinomic lexical items. Finally, we introduce some measure functions, which evaluate how a given vector might accept an antonym and how a given word is an acceptable antonym of another term.
Imagerie De La Femme, 2007
L’identification des femmes porteuses d’une mutation génétique ou d’une prédisposition génétique ... more L’identification des femmes porteuses d’une mutation génétique ou d’une prédisposition génétique suffisamment élevée pour être assimilée à la première catégorie, est déterminante pour pouvoir induire une modification de la stratégie de dépistage. Compte tenu du risque cumulé de cancer du sein et du vécu très anxiogène des investigations chez ces patientes, le radiologue se doit d’optimiser le choix des technologies à disposition, le rythme des examens proposés et l’ordre chronologique de leur réalisation. Les particularités tumorales et de croissance cellulaire connues chez les patientes mutées vont contribuer à élaborer ces modalités.Au travers des dernières recommandations nationales publiées, d’une analyse critique de la littérature, de la revue des technologies récentes et des pratiques, les modalités de dépistage sont exposées et commentées.The identification of women carrying a genetic mutation or who have a sufficiently high genetic predisposition to be included in this category is a determining factor in our ability to change screening strategies. Given the accumulated risk of breast cancer and the anxiety induced in patients undergoing these examinations, it is the radiologist's duty to optimize the available technologies, the frequency of the examinations, and their chronological order. The known tumor and cell growth specificities in women with mutations can contribute to define these screening modalities.Examining the latest national published recommendations, a critical review of the literature, as well as recent technologies and practices, this article reviews and comments breast cancer screening modalities.
Dans le cadre de recherches sur le sens en traitement automatique du langage, nous nous concentro... more Dans le cadre de recherches sur le sens en traitement automatique du langage, nous nous concentrons sur la représentation de l'aspect thématique des segments textuelsà l'aide de vecteurs conceptuels. Les vecteurs conceptuels sont automatiquement apprisà partir de définitions issues de dictionnairesà usage humain . Un noyau de termes manuellements indexés est nécessaire pour l'amorçage de cette analyse. Lorsque l'item défini s'y prête, ces définitions sont complétées par des termes en relation avec lui. Ces relations sont des fonctions lexicales (Mel'čuk and al, 95) comme l'hyponymie, l'hyperonymie, la synonymie ou l'antonymie. Cet article propose d'améliorer la fonction d'antonymie naïve exposée dans grâceà ces informations. La fonction s'auto-modifie, par modification de listes, en fonction des relations d'antonymie avérées entre deux items. Nous exposons la méthode utilisée, quelques résultats puis nous concluons sur les perspectives ouvertes. In the framework of research in the field of meaning representation, we focus our attention on thematic aspects of textual segments represented with conceptual vectors. Conceptual vectors are automatically learned and refined by analyzing human usage dictionary definitions . A kernel of terms manually indexed is needed for bootstrapping this analysis. When possible, these definitions are completed with related terms. The considered relations are typically instances of lexical functions (Mel'čuk and al, 95) as hyponymy, hyperonymy, synonyme and antonymy. This paper is an experimented proposal to take advantage of these information to enhance the naive antonymy function as proposed in and . The function can self-adjust, by modifications of antonym lists as extracted or induced from lexical data. We expose the overall method behind this process, some experimental results and conclude on some open perspectives.
Uploads
Papers by Violaine Prince