North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler a... more North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler analytic construction. Our data (>2K examples culled from >.5M words) track this change through three generations and parameters of semantics, syntax, and geography. Intense contact pressure on this minority language probably promotes morphological simplification, yielding an advantage for the innovative construction. The innovative construction is additionally advantaged because it has a wider syntactic and semantic range and is indispensable, whereas its competitor can always be replaced. The one environment where the possessive suffix is most strongly retained even in the youngest generation is in the Nominative singular case, and here we find evidence that the possessive suffix is being reinterpreted as a vocative case marker. The files make it possible to see all of our data and to do the statistical analysis and plots in R.
This paper discusses the development and application of a Constraint Grammar parser for the Plain... more This paper discusses the development and application of a Constraint Grammar parser for the Plains Cree language. The focus of this parser is the identification of relationships between verbs and arguments. The rich morphology and non-configurational syntax of Plains Cree make it an excellent candidate for the application of a Constraint Grammar parser, which is comprised of sets of constraints with two aims: 1) the disambiguation of ambiguous word forms, and 2) the mapping of syntactic relationships between word forms on the basis of morphological features and sentential context. Syntactic modelling of verb and argument relationships in Plains Cree is demonstrated to be a straightforward process, though various semantic and pragmatic features should improve the current parser considerably. When applied to even a relatively small corpus of Plains Cree, the Constraint Grammar parser allows for the identification of common word order patterns and for relationships between word order a...
The article presents Vuosttas Digisanit (VD), an electronic dictionary from North Sami to Norwegi... more The article presents Vuosttas Digisanit (VD), an electronic dictionary from North Sami to Norwegian. Its novelty lies in the way we have utilized existing resources (a basic dictionary and a morphological analyser/generator) in order to create a reception dictionary for language learners for a morphologically rich language. With only 7,9 % of the word forms in Sami running text being identical to the lemma form, an approach along the lines sketched here is a prerequisite for a text-integrated e-dictionary. Being a learner dictionary, VD also gives key paradigms for each lemma. This paradigm is generated when building the dictionary, using our language technology tools. We have also built an infrastructure that can be reused for other languages and dictionaries. Our approach shows how it is possible to build textintegrated electronic dictionaries for morphologically complex languages with limited means. The dictionary is available free of charge at: http://giellatekno.uit.no/words/di...
Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, whic... more Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for crosslinguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.
LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty ... more LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles Universit
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treeban... more Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treeban... more Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler a... more North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler analytic construction. Our data (>2K examples culled from >.5M words) track this change through three generations and parameters of semantics, syntax, and geography. Intense contact pressure on this minority language probably promotes morphological simplification, yielding an advantage for the innovative construction. The innovative construction is additionally advantaged because it has a wider syntactic and semantic range and is indispensable, whereas its competitor can always be replaced. The one environment where the possessive suffix is most strongly retained even in the youngest generation is in the Nominative singular case, and here we find evidence that the possessive suffix is being reinterpreted as a vocative case marker. The files make it possible to see all of our data and to do the statistical analysis and plots in R.
This paper discusses the development and application of a Constraint Grammar parser for the Plain... more This paper discusses the development and application of a Constraint Grammar parser for the Plains Cree language. The focus of this parser is the identification of relationships between verbs and arguments. The rich morphology and non-configurational syntax of Plains Cree make it an excellent candidate for the application of a Constraint Grammar parser, which is comprised of sets of constraints with two aims: 1) the disambiguation of ambiguous word forms, and 2) the mapping of syntactic relationships between word forms on the basis of morphological features and sentential context. Syntactic modelling of verb and argument relationships in Plains Cree is demonstrated to be a straightforward process, though various semantic and pragmatic features should improve the current parser considerably. When applied to even a relatively small corpus of Plains Cree, the Constraint Grammar parser allows for the identification of common word order patterns and for relationships between word order a...
The article presents Vuosttas Digisanit (VD), an electronic dictionary from North Sami to Norwegi... more The article presents Vuosttas Digisanit (VD), an electronic dictionary from North Sami to Norwegian. Its novelty lies in the way we have utilized existing resources (a basic dictionary and a morphological analyser/generator) in order to create a reception dictionary for language learners for a morphologically rich language. With only 7,9 % of the word forms in Sami running text being identical to the lemma form, an approach along the lines sketched here is a prerequisite for a text-integrated e-dictionary. Being a learner dictionary, VD also gives key paradigms for each lemma. This paradigm is generated when building the dictionary, using our language technology tools. We have also built an infrastructure that can be reused for other languages and dictionaries. Our approach shows how it is possible to build textintegrated electronic dictionaries for morphologically complex languages with limited means. The dictionary is available free of charge at: http://giellatekno.uit.no/words/di...
Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, whic... more Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for crosslinguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.
LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty ... more LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles Universit
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treeban... more Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treeban... more Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Uploads
Papers by Lene Antonsen