In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectiv... more In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectives and issues of the French Creagest Project (2007)(2008)(2009)(2010)(2011)(2012) in setting up, documenting and annotating a large corpus of adult and child French Sign Language (LSF) and of natural gestural language. In section 2., we address theoretical and practical issues, emphasizing the outstanding features of the Creagest Project. In section 3., we discuss methodological issues for data collection. Finally, in section 4., we cover technical aspects of LSF video data editing and corpus annotation, in the perspective of setting up a corpus-based formalized description of LSF.
La ressource présentée dans cet article combine un corpus de noms déverbaux annotés sémantiquemen... more La ressource présentée dans cet article combine un corpus de noms déverbaux annotés sémantiquement et syntaxiquement, fondé sur le French Treebank, et un lexique électronique fournissant des informations d'ordre morphologique, syntaxique et sémantique sur les noms présents dans le corpus ainsi que sur les verbes dont ils sont dérivés.
Proceedings of the First International Workshop on Lexical Resources, WoLeR 2011, 2011
ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken fro... more ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken from a corpus, in an empirical way. It is centered on the development of two resources: a semantically and syntactically annotated corpus of deverbal nouns based on the French Treebank, and an electronic lexicon, providing descriptions of morphological, syntactic and semantic properties of the deverbal nouns found in our corpus. Both resources are presented in this paper, with a focus on the comparison between corpus data and lexicon data.
La ressource présentée dans cet article combine un corpus de noms déverbaux annotés sémantiquemen... more La ressource présentée dans cet article combine un corpus de noms déverbaux annotés sémantiquement et syntaxiquement, fondé sur le French Treebank, et un lexique électronique fournissant des informations d'ordre morphologique, syntaxique et sémantique sur les noms présents dans le corpus ainsi que sur les verbes dont ils sont dérivés.
The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an em... more The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an empirical way. It is centered on the development of two resources: a semantically annotated corpus of deverbal nouns, and an electronic lexicon. They are both presented in this paper, and emphasize how the semantic annotations of the corpus allow the lexicographic description of deverbal nouns to be validated, in particular their polysemy.
ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken fro... more ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken from a corpus, in an empirical way. It is centered on the development of two resources: a semantically and syntactically annotated corpus of deverbal nouns based on the French Treebank, and an electronic lexicon, providing descriptions of morphological, syntactic and semantic properties of the deverbal nouns found in our corpus. Both resources are presented in this paper, with a focus on the comparison between corpus data and lexicon data.
Corpus-based Sign Language linguistics has emerged as a new linguistic domain, and as a consequen... more Corpus-based Sign Language linguistics has emerged as a new linguistic domain, and as a consequence large-scale and controlled video data repositories are under construction for different Sign Languages. Nevertheless, as pointed by (Johnston, 2008) no unified annotation scheme is yet available, which compromises any chance of comparing or reusing corpora across research teams. Another related issue is the comparability of descriptions and formalizations between SL linguistics and mainstream linguistics. In this paper, we address the issue of the definition of a common annotation scheme for Sign Language corpora annotation, distribution, exchange and comparison. In section 2. we discuss the challenge of building inter-operable corpora for corpus-based linguistics. We also examine existing annotation schemes or strategies proposed for SL linguistics. In section 3. we propose a small set of annotation tiers, based on Frame-Semantics, as a common annotation scheme. We also propose to add text-level as well as utterance-level metadata to this common annotation scheme, in order to broaden the range of future uses of SL corpora.
In this paper, we present evidence from a case study in LSF, conducted on narratives from 6 adult... more In this paper, we present evidence from a case study in LSF, conducted on narratives from 6 adult signers. In this study, picture and video stimuli have been used in order to identify the role of non-manual features such as gaze, facial expressions and mouth features. Hereafter, we discuss the importance of mouth features as markers of the alternation between frozen (Lexical Units, LU) and productive signs (Highly Iconic Structures, HIS). Based on qualitative and quantitative analysis, we propose to consider mouth features, i.e. mouthings on the one hand, and mouth gestures on the other hand, as markers, respectively, of LU versus HIS. As such, we propose to consider mouthings and mouth gestures as fundamental cues for determining the nature, role and interpretation of manual signs, in conjunction with other non-manual features. We propose an ELAN annotation template for mouth features in Sign Languages, together with a discussion on the different mouth features and their respective roles as discourse and syntactic-semantic operators.
The resource presented in this paper combines a semantically and syntactically annotated corpus o... more The resource presented in this paper combines a semantically and syntactically annotated corpus of deverbal nouns based on the French Treebank, and an electronic lexicon, providing descriptions of morphological, syntactic and semantic properties of the deverbal nouns found in our corpus and of their verbal sources.
Le corpus TALC-sef (pos-TAgged Literary Corpus, Serbian-English-French), initié dans le cadre de ... more Le corpus TALC-sef (pos-TAgged Literary Corpus, Serbian-English-French), initié dans le cadre de deux projets de recherche (2007-2009 et 2010-2011) impliquant les universités Lille 3, Artois et l'université de Belgrade, était à l'origine conçu comme un corpus parallèle de référence, de traductions dans le domaine littéraire, pour le serbe, l'anglais et le français. En complément d'un alignement au niveau phrastique, réalisé de façon automatique et validé manuellement, des annotations morpho-syntaxiques en parties du discours avaient été rajoutées dès les premières versions du corpus, de façon complètement automatique pour les sous-corpus français et anglais, grâce aux modèles d'étiquetage du Treetagger (Schmid, 1994). Toutefois, faute de ressources exploitables en serbe, ces annotations n'avaient pu être menées à bien. Nous détaillons dans la suite de cet article la méthodologie adoptée pour la définition d'un jeu d'étiquettes syntaxiques pour la langue serbe, les choix techniques et linguistiques que nous avons faits dans le but d'enrichir le sous-corpus serbe d'annotations syntaxiques comparables à celles des sous-corpus français et anglais, afin de permettre des recherche de cooccurrences d'étiquettes syntaxiques en parallèle dans les trois langues. Puis, nous discutons des performances d'étiquetage syntaxique enregistrées avec trois étiqueteurs librement disponibles : TnT (Brants 2000), Treetagger (Schmid, 1994) et BTagger (Gesmundo & Samardžić, 2012), avant d'aborder les perspectives d'exploitation de l'étiquetage syntaxique réalisé pour d'autres niveaux d'annotations, en soulignant l'apport de ce corpus multilingue pour la linguistique française.
In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectiv... more In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectives and issues of the French Creagest Project (2007)(2008)(2009)(2010)(2011)(2012) in setting up, documenting and annotating a large corpus of adult and child French Sign Language (LSF) and of natural gestural language. In section 2., we address theoretical and practical issues, emphasizing the outstanding features of the Creagest Project. In section 3., we discuss methodological issues for data collection. Finally, in section 4., we cover technical aspects of LSF video data editing and corpus annotation, in the perspective of setting up a corpus-based formalized description of LSF.
The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an em... more The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an empirical way. It is centered on the development of two resources: a semantically annotated corpus of deverbal nouns, and an electronic lexicon. They are both presented in this paper, and emphasize how the semantic annotations of the corpus allow the lexicographic description of deverbal nouns to be validated, in particular their polysemy.
HAL - hal.archives-ouvertes.fr, CCSd - Centre pour la Communication Scientifique Direct. Accueil;... more HAL - hal.archives-ouvertes.fr, CCSd - Centre pour la Communication Scientifique Direct. Accueil; Dépôt: S'authentifier; S'inscrire. Consultation: Par domaine; Les 30 derniers dépôts; Par année de publication, rédaction, dépôt; Par type de publication; Par collection; Les portails de l'archive ouverte HAL; Par établissement (extraction automatique); ArXiv; Les Thèses (TEL). Recherche: Recherche simple; Recherche avancée; Accès par identifiant; Les Thèses ...
Because of the ever-increasing amount of information in patients' EHRs, healthcare professionals ... more Because of the ever-increasing amount of information in patients' EHRs, healthcare professionals may face difficulties for making diagnoses and/or therapeutic decisions. Moreover, patients may misunderstand their health status. These medical practitioners need effective tools to locate in real time relevant elements within the patients' EHR and visualize them according to synthetic and intuitive presentation models. The RAVEL project aims at achieving this goal by performing a high profile industrial research and development program on the EHR considering the following areas: (i) semantic indexing, (ii) information retrieval, and (iii) data visualization. The RAVEL project is expected to implement a generic, loosely coupled to data sources prototype so that it can be transposed into different university hospitals information systems.
In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectiv... more In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectives and issues of the French Creagest Project (2007)(2008)(2009)(2010)(2011)(2012) in setting up, documenting and annotating a large corpus of adult and child French Sign Language (LSF) and of natural gestural language. In section 2., we address theoretical and practical issues, emphasizing the outstanding features of the Creagest Project. In section 3., we discuss methodological issues for data collection. Finally, in section 4., we cover technical aspects of LSF video data editing and corpus annotation, in the perspective of setting up a corpus-based formalized description of LSF.
La ressource présentée dans cet article combine un corpus de noms déverbaux annotés sémantiquemen... more La ressource présentée dans cet article combine un corpus de noms déverbaux annotés sémantiquement et syntaxiquement, fondé sur le French Treebank, et un lexique électronique fournissant des informations d'ordre morphologique, syntaxique et sémantique sur les noms présents dans le corpus ainsi que sur les verbes dont ils sont dérivés.
Proceedings of the First International Workshop on Lexical Resources, WoLeR 2011, 2011
ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken fro... more ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken from a corpus, in an empirical way. It is centered on the development of two resources: a semantically and syntactically annotated corpus of deverbal nouns based on the French Treebank, and an electronic lexicon, providing descriptions of morphological, syntactic and semantic properties of the deverbal nouns found in our corpus. Both resources are presented in this paper, with a focus on the comparison between corpus data and lexicon data.
La ressource présentée dans cet article combine un corpus de noms déverbaux annotés sémantiquemen... more La ressource présentée dans cet article combine un corpus de noms déverbaux annotés sémantiquement et syntaxiquement, fondé sur le French Treebank, et un lexique électronique fournissant des informations d'ordre morphologique, syntaxique et sémantique sur les noms présents dans le corpus ainsi que sur les verbes dont ils sont dérivés.
The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an em... more The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an empirical way. It is centered on the development of two resources: a semantically annotated corpus of deverbal nouns, and an electronic lexicon. They are both presented in this paper, and emphasize how the semantic annotations of the corpus allow the lexicographic description of deverbal nouns to be validated, in particular their polysemy.
ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken fro... more ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken from a corpus, in an empirical way. It is centered on the development of two resources: a semantically and syntactically annotated corpus of deverbal nouns based on the French Treebank, and an electronic lexicon, providing descriptions of morphological, syntactic and semantic properties of the deverbal nouns found in our corpus. Both resources are presented in this paper, with a focus on the comparison between corpus data and lexicon data.
Corpus-based Sign Language linguistics has emerged as a new linguistic domain, and as a consequen... more Corpus-based Sign Language linguistics has emerged as a new linguistic domain, and as a consequence large-scale and controlled video data repositories are under construction for different Sign Languages. Nevertheless, as pointed by (Johnston, 2008) no unified annotation scheme is yet available, which compromises any chance of comparing or reusing corpora across research teams. Another related issue is the comparability of descriptions and formalizations between SL linguistics and mainstream linguistics. In this paper, we address the issue of the definition of a common annotation scheme for Sign Language corpora annotation, distribution, exchange and comparison. In section 2. we discuss the challenge of building inter-operable corpora for corpus-based linguistics. We also examine existing annotation schemes or strategies proposed for SL linguistics. In section 3. we propose a small set of annotation tiers, based on Frame-Semantics, as a common annotation scheme. We also propose to add text-level as well as utterance-level metadata to this common annotation scheme, in order to broaden the range of future uses of SL corpora.
In this paper, we present evidence from a case study in LSF, conducted on narratives from 6 adult... more In this paper, we present evidence from a case study in LSF, conducted on narratives from 6 adult signers. In this study, picture and video stimuli have been used in order to identify the role of non-manual features such as gaze, facial expressions and mouth features. Hereafter, we discuss the importance of mouth features as markers of the alternation between frozen (Lexical Units, LU) and productive signs (Highly Iconic Structures, HIS). Based on qualitative and quantitative analysis, we propose to consider mouth features, i.e. mouthings on the one hand, and mouth gestures on the other hand, as markers, respectively, of LU versus HIS. As such, we propose to consider mouthings and mouth gestures as fundamental cues for determining the nature, role and interpretation of manual signs, in conjunction with other non-manual features. We propose an ELAN annotation template for mouth features in Sign Languages, together with a discussion on the different mouth features and their respective roles as discourse and syntactic-semantic operators.
The resource presented in this paper combines a semantically and syntactically annotated corpus o... more The resource presented in this paper combines a semantically and syntactically annotated corpus of deverbal nouns based on the French Treebank, and an electronic lexicon, providing descriptions of morphological, syntactic and semantic properties of the deverbal nouns found in our corpus and of their verbal sources.
Le corpus TALC-sef (pos-TAgged Literary Corpus, Serbian-English-French), initié dans le cadre de ... more Le corpus TALC-sef (pos-TAgged Literary Corpus, Serbian-English-French), initié dans le cadre de deux projets de recherche (2007-2009 et 2010-2011) impliquant les universités Lille 3, Artois et l'université de Belgrade, était à l'origine conçu comme un corpus parallèle de référence, de traductions dans le domaine littéraire, pour le serbe, l'anglais et le français. En complément d'un alignement au niveau phrastique, réalisé de façon automatique et validé manuellement, des annotations morpho-syntaxiques en parties du discours avaient été rajoutées dès les premières versions du corpus, de façon complètement automatique pour les sous-corpus français et anglais, grâce aux modèles d'étiquetage du Treetagger (Schmid, 1994). Toutefois, faute de ressources exploitables en serbe, ces annotations n'avaient pu être menées à bien. Nous détaillons dans la suite de cet article la méthodologie adoptée pour la définition d'un jeu d'étiquettes syntaxiques pour la langue serbe, les choix techniques et linguistiques que nous avons faits dans le but d'enrichir le sous-corpus serbe d'annotations syntaxiques comparables à celles des sous-corpus français et anglais, afin de permettre des recherche de cooccurrences d'étiquettes syntaxiques en parallèle dans les trois langues. Puis, nous discutons des performances d'étiquetage syntaxique enregistrées avec trois étiqueteurs librement disponibles : TnT (Brants 2000), Treetagger (Schmid, 1994) et BTagger (Gesmundo & Samardžić, 2012), avant d'aborder les perspectives d'exploitation de l'étiquetage syntaxique réalisé pour d'autres niveaux d'annotations, en soulignant l'apport de ce corpus multilingue pour la linguistique française.
In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectiv... more In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectives and issues of the French Creagest Project (2007)(2008)(2009)(2010)(2011)(2012) in setting up, documenting and annotating a large corpus of adult and child French Sign Language (LSF) and of natural gestural language. In section 2., we address theoretical and practical issues, emphasizing the outstanding features of the Creagest Project. In section 3., we discuss methodological issues for data collection. Finally, in section 4., we cover technical aspects of LSF video data editing and corpus annotation, in the perspective of setting up a corpus-based formalized description of LSF.
The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an em... more The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an empirical way. It is centered on the development of two resources: a semantically annotated corpus of deverbal nouns, and an electronic lexicon. They are both presented in this paper, and emphasize how the semantic annotations of the corpus allow the lexicographic description of deverbal nouns to be validated, in particular their polysemy.
HAL - hal.archives-ouvertes.fr, CCSd - Centre pour la Communication Scientifique Direct. Accueil;... more HAL - hal.archives-ouvertes.fr, CCSd - Centre pour la Communication Scientifique Direct. Accueil; Dépôt: S'authentifier; S'inscrire. Consultation: Par domaine; Les 30 derniers dépôts; Par année de publication, rédaction, dépôt; Par type de publication; Par collection; Les portails de l'archive ouverte HAL; Par établissement (extraction automatique); ArXiv; Les Thèses (TEL). Recherche: Recherche simple; Recherche avancée; Accès par identifiant; Les Thèses ...
Because of the ever-increasing amount of information in patients' EHRs, healthcare professionals ... more Because of the ever-increasing amount of information in patients' EHRs, healthcare professionals may face difficulties for making diagnoses and/or therapeutic decisions. Moreover, patients may misunderstand their health status. These medical practitioners need effective tools to locate in real time relevant elements within the patients' EHR and visualize them according to synthetic and intuitive presentation models. The RAVEL project aims at achieving this goal by performing a high profile industrial research and development program on the EHR considering the following areas: (i) semantic indexing, (ii) information retrieval, and (iii) data visualization. The RAVEL project is expected to implement a generic, loosely coupled to data sources prototype so that it can be transposed into different university hospitals information systems.
Uploads
Papers by Antonio Balvet
nouns found in our corpus. Both resources are presented in this paper, with a focus on the comparison between corpus data and lexicon
data.
issue of the definition of a common annotation scheme for Sign Language corpora annotation, distribution, exchange and comparison. In
section 2. we discuss the challenge of building inter-operable corpora for corpus-based linguistics. We also examine existing annotation
schemes or strategies proposed for SL linguistics. In section 3. we propose a small set of annotation tiers, based on Frame-Semantics, as
a common annotation scheme. We also propose to add text-level as well as utterance-level metadata to this common annotation scheme,
in order to broaden the range of future uses of SL corpora.
Hereafter, we discuss the importance of mouth features as markers of the alternation between frozen (Lexical Units, LU) and productive signs (Highly Iconic Structures, HIS). Based on qualitative and quantitative analysis, we propose to consider mouth features, i.e. mouthings on the one hand, and mouth gestures on the other hand, as markers, respectively, of LU versus HIS. As such, we propose to consider mouthings and mouth gestures as fundamental cues for determining the nature, role and interpretation of manual signs, in conjunction with other non-manual features. We propose an ELAN annotation template for mouth features in Sign
Languages, together with a discussion on the different mouth features and their respective roles as discourse and syntactic-semantic operators.
descriptions of morphological, syntactic and semantic properties of the deverbal nouns found in our corpus and of their verbal sources.
projets de recherche (2007-2009 et 2010-2011) impliquant les universités Lille 3, Artois et l'université de
Belgrade, était à l'origine conçu comme un corpus parallèle de référence, de traductions dans le domaine
littéraire, pour le serbe, l'anglais et le français. En complément d'un alignement au niveau phrastique,
réalisé de façon automatique et validé manuellement, des annotations morpho-syntaxiques en parties du
discours avaient été rajoutées dès les premières versions du corpus, de façon complètement automatique
pour les sous-corpus français et anglais, grâce aux modèles d'étiquetage du Treetagger (Schmid, 1994).
Toutefois, faute de ressources exploitables en serbe, ces annotations n'avaient pu être menées à bien. Nous
détaillons dans la suite de cet article la méthodologie adoptée pour la définition d'un jeu d'étiquettes
syntaxiques pour la langue serbe, les choix techniques et linguistiques que nous avons faits dans le but
d'enrichir le sous-corpus serbe d'annotations syntaxiques comparables à celles des sous-corpus français et
anglais, afin de permettre des recherche de cooccurrences d'étiquettes syntaxiques en parallèle dans les
trois langues. Puis, nous discutons des performances d'étiquetage syntaxique enregistrées avec trois
étiqueteurs librement disponibles : TnT (Brants 2000), Treetagger (Schmid, 1994) et BTagger (Gesmundo
& Samardžić, 2012), avant d'aborder les perspectives d'exploitation de l'étiquetage syntaxique réalisé
pour d'autres niveaux d'annotations, en soulignant l'apport de ce corpus multilingue pour la linguistique
française.
nouns found in our corpus. Both resources are presented in this paper, with a focus on the comparison between corpus data and lexicon
data.
issue of the definition of a common annotation scheme for Sign Language corpora annotation, distribution, exchange and comparison. In
section 2. we discuss the challenge of building inter-operable corpora for corpus-based linguistics. We also examine existing annotation
schemes or strategies proposed for SL linguistics. In section 3. we propose a small set of annotation tiers, based on Frame-Semantics, as
a common annotation scheme. We also propose to add text-level as well as utterance-level metadata to this common annotation scheme,
in order to broaden the range of future uses of SL corpora.
Hereafter, we discuss the importance of mouth features as markers of the alternation between frozen (Lexical Units, LU) and productive signs (Highly Iconic Structures, HIS). Based on qualitative and quantitative analysis, we propose to consider mouth features, i.e. mouthings on the one hand, and mouth gestures on the other hand, as markers, respectively, of LU versus HIS. As such, we propose to consider mouthings and mouth gestures as fundamental cues for determining the nature, role and interpretation of manual signs, in conjunction with other non-manual features. We propose an ELAN annotation template for mouth features in Sign
Languages, together with a discussion on the different mouth features and their respective roles as discourse and syntactic-semantic operators.
descriptions of morphological, syntactic and semantic properties of the deverbal nouns found in our corpus and of their verbal sources.
projets de recherche (2007-2009 et 2010-2011) impliquant les universités Lille 3, Artois et l'université de
Belgrade, était à l'origine conçu comme un corpus parallèle de référence, de traductions dans le domaine
littéraire, pour le serbe, l'anglais et le français. En complément d'un alignement au niveau phrastique,
réalisé de façon automatique et validé manuellement, des annotations morpho-syntaxiques en parties du
discours avaient été rajoutées dès les premières versions du corpus, de façon complètement automatique
pour les sous-corpus français et anglais, grâce aux modèles d'étiquetage du Treetagger (Schmid, 1994).
Toutefois, faute de ressources exploitables en serbe, ces annotations n'avaient pu être menées à bien. Nous
détaillons dans la suite de cet article la méthodologie adoptée pour la définition d'un jeu d'étiquettes
syntaxiques pour la langue serbe, les choix techniques et linguistiques que nous avons faits dans le but
d'enrichir le sous-corpus serbe d'annotations syntaxiques comparables à celles des sous-corpus français et
anglais, afin de permettre des recherche de cooccurrences d'étiquettes syntaxiques en parallèle dans les
trois langues. Puis, nous discutons des performances d'étiquetage syntaxique enregistrées avec trois
étiqueteurs librement disponibles : TnT (Brants 2000), Treetagger (Schmid, 1994) et BTagger (Gesmundo
& Samardžić, 2012), avant d'aborder les perspectives d'exploitation de l'étiquetage syntaxique réalisé
pour d'autres niveaux d'annotations, en soulignant l'apport de ce corpus multilingue pour la linguistique
française.