slovar-slovenskega-knjiznega-jezika-2/. SSKJ2 = Slovar slovenskega knjižnega jezika, druga, dopol... more slovar-slovenskega-knjiznega-jezika-2/. SSKJ2 = Slovar slovenskega knjižnega jezika, druga, dopolnjena in deloma prenovljena izdaja Dictionary of the Slovenian Standard Language, 2nd Edition (2014): https://fran.si/130/sskjslovar-slovenskega-knjiznega-jezika/. SSSJ = Sprotni slovar slovenskega jezika [Growing Dictionary of the Slovenian Language] (2014-): https://fran.si/132/sprotni-sprotni-slovar-slovenskega-jezika.
The present paper examines a variety of ways in which the Corpus of Contemporary Romanian Languag... more The present paper examines a variety of ways in which the Corpus of Contemporary Romanian Language (CoRoLa) can be used. A multitude of examples intends to highlight a wide range of interrogation possibilities that CoRoLa opens for different types of users. The querying of CoRoLa displayed here is supported by the KorAP frontend, through the querying language Poliqarp. Interrogations address annotation layers, such as the lexical, morphological and, in the near future, the syntactical layer, as well as the metadata. Other issues discussed are how to build a virtual corpus, how to deal with errors, how to find expressions and how to identify expressions
The paper reports difficulties encountered during the alignment of synsets between English and Ro... more The paper reports difficulties encountered during the alignment of synsets between English and Romanian. Reasons for these difficulties are inconsistencies found both in Princeton WordNet (we will refer to it from now on with PWN), on one part, and in our sources, on the other part, the difference in criteria based on which senses were
The article presents the results of a survey on dictionary use in Europe, focusing on general mon... more The article presents the results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries. The survey is the broadest survey of dictionary use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty countries. Our survey covers varied user groups, going beyond the students and translators who have tended to dominate such studies thus far. The survey was delivered via an online survey platform, in language versions specific to each target country. It was completed by 9,562 respondents, over 300 respondents per country on average. The survey consisted of the general section, which was translated and presented to all participants, as well as country-specific sections for a subset of 11 countries, which were drafted by collaborators at the national level. The present report covers the general section. 1 Introduction Research into dictionary use has become increasingly important in recent years. In contrast to 15 years ago, new findings in this area are presented every year, e.g. at every Euralex or eLex conference. These studies range from questionnaire or log file studies to smaller-scale studies focussing on eye tracking, usability, or other aspects of dictionary use measurable in a lab. For an overview of different studies,
In the context of the globalised Information Society and the variety of solutions for computer-ai... more In the context of the globalised Information Society and the variety of solutions for computer-aided acquisition of traditional dictionaries, the paper presents the actual stage of development of the new series of the Romanian Dictionary edited by the Romanian Academy. Through a project financed by the National University Research Council of Romania, some preliminary steps toward a computer-aided acquisition of the dictionary have been made and are outlined in this article.
This work represents a first step in the direction of reconstructing a diachronic morphology for ... more This work represents a first step in the direction of reconstructing a diachronic morphology for Romanian. The main resource used in this task is the digital version of the Romanian Language Thesaurus Dictionary (eDTLR). This resource offers various usage examples for its entries, citations extracted from old and modern Romanian texts. The concept of "word deformation" is introduced and classified into more categories. The research conducted aims at detecting one type of such deformations occurring in the citations – changes only in the root of the old form words, without the migration to another paradigm. An algorithm is presented which automatically infers old root forms, and which is based on a paradigmatic data model of the current Romanian morphology. Having the inferred roots and the paradigms that they are part of, old flexion forms of the words can be deduced. Even more, by exploiting the chronology of the citations, the inferred old word forms can be framed in cer...
The paper argues in favour of an electronic form of the thesaurus dictionary of the Romanian lang... more The paper argues in favour of an electronic form of the thesaurus dictionary of the Romanian language, the dictionary edited by the Romanian Academy in two editions since 1913. Preliminary steps like scanning, optical character recognition, and pre-processing operations have already been done. The paper presents a prototype for the correction of the digital form of the dictionary. The numerous advantages of the digital thesaurus dictionary are discussed, as a basis for future work in Romanian lexicography and, more generally, in language processing. Key words: resources.
This paper presents a first step towards constructing the diachronic Romanian morphology. First, ... more This paper presents a first step towards constructing the diachronic Romanian morphology. First, the "deformation" of a word is introduced and a classification of such deformations is proposed. The conducted research aims at detecting deformations in the roots of inflectional words (nouns, adjectives and verbs). The algorithm we present uses two important resources: a morphological dictionary of the current Romanian language, which also models the inflectional paradigms of the language, and eDTLR – the digital version of the Romanian Thesaurus Dictionary. In eDTLR each title word has associated a set of citations extracted from the Romanian literature, each having attached the year of publication. The algorithm detects root deformations in words by comparing word forms of the current language with forms extracted from the eDTLR citations. For every root change, the deformed root is deducted and all the diachronic forms are inferred. Also, using the chronology of citations,...
The Doctoral Consortium at EUROLAN¬2015 will provide an opportunity for graduate students enrolle... more The Doctoral Consortium at EUROLAN¬2015 will provide an opportunity for graduate students enrolled in PhD studies in Computational Linguistics, Natural Language Processing and Semantic Web to present their current work and receive constructive feedback and guidance, both from the general audience of the Summer School and the invited lecturers. The opinions expressed freely and in a friendly atmosphere will help presenters to correct modeling errors or misconceptions in early phases of their PhD research, to exercise their final defending in the front of an international panel of experts, to enhance the evaluation and comparison of their results with the state¬-of¬-the¬-art, to find new ideas for continuing their investigations, to enhance presentation skills, and, not the least, as often has been the case in former meetings of this kind, to establish future collaboration coalitions. http://eurolan.info.uaic.ro/2015/events/doctoral-consortium/
slovar-slovenskega-knjiznega-jezika-2/. SSKJ2 = Slovar slovenskega knjižnega jezika, druga, dopol... more slovar-slovenskega-knjiznega-jezika-2/. SSKJ2 = Slovar slovenskega knjižnega jezika, druga, dopolnjena in deloma prenovljena izdaja Dictionary of the Slovenian Standard Language, 2nd Edition (2014): https://fran.si/130/sskjslovar-slovenskega-knjiznega-jezika/. SSSJ = Sprotni slovar slovenskega jezika [Growing Dictionary of the Slovenian Language] (2014-): https://fran.si/132/sprotni-sprotni-slovar-slovenskega-jezika.
The present paper examines a variety of ways in which the Corpus of Contemporary Romanian Languag... more The present paper examines a variety of ways in which the Corpus of Contemporary Romanian Language (CoRoLa) can be used. A multitude of examples intends to highlight a wide range of interrogation possibilities that CoRoLa opens for different types of users. The querying of CoRoLa displayed here is supported by the KorAP frontend, through the querying language Poliqarp. Interrogations address annotation layers, such as the lexical, morphological and, in the near future, the syntactical layer, as well as the metadata. Other issues discussed are how to build a virtual corpus, how to deal with errors, how to find expressions and how to identify expressions
The paper reports difficulties encountered during the alignment of synsets between English and Ro... more The paper reports difficulties encountered during the alignment of synsets between English and Romanian. Reasons for these difficulties are inconsistencies found both in Princeton WordNet (we will refer to it from now on with PWN), on one part, and in our sources, on the other part, the difference in criteria based on which senses were
The article presents the results of a survey on dictionary use in Europe, focusing on general mon... more The article presents the results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries. The survey is the broadest survey of dictionary use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty countries. Our survey covers varied user groups, going beyond the students and translators who have tended to dominate such studies thus far. The survey was delivered via an online survey platform, in language versions specific to each target country. It was completed by 9,562 respondents, over 300 respondents per country on average. The survey consisted of the general section, which was translated and presented to all participants, as well as country-specific sections for a subset of 11 countries, which were drafted by collaborators at the national level. The present report covers the general section. 1 Introduction Research into dictionary use has become increasingly important in recent years. In contrast to 15 years ago, new findings in this area are presented every year, e.g. at every Euralex or eLex conference. These studies range from questionnaire or log file studies to smaller-scale studies focussing on eye tracking, usability, or other aspects of dictionary use measurable in a lab. For an overview of different studies,
In the context of the globalised Information Society and the variety of solutions for computer-ai... more In the context of the globalised Information Society and the variety of solutions for computer-aided acquisition of traditional dictionaries, the paper presents the actual stage of development of the new series of the Romanian Dictionary edited by the Romanian Academy. Through a project financed by the National University Research Council of Romania, some preliminary steps toward a computer-aided acquisition of the dictionary have been made and are outlined in this article.
This work represents a first step in the direction of reconstructing a diachronic morphology for ... more This work represents a first step in the direction of reconstructing a diachronic morphology for Romanian. The main resource used in this task is the digital version of the Romanian Language Thesaurus Dictionary (eDTLR). This resource offers various usage examples for its entries, citations extracted from old and modern Romanian texts. The concept of "word deformation" is introduced and classified into more categories. The research conducted aims at detecting one type of such deformations occurring in the citations – changes only in the root of the old form words, without the migration to another paradigm. An algorithm is presented which automatically infers old root forms, and which is based on a paradigmatic data model of the current Romanian morphology. Having the inferred roots and the paradigms that they are part of, old flexion forms of the words can be deduced. Even more, by exploiting the chronology of the citations, the inferred old word forms can be framed in cer...
The paper argues in favour of an electronic form of the thesaurus dictionary of the Romanian lang... more The paper argues in favour of an electronic form of the thesaurus dictionary of the Romanian language, the dictionary edited by the Romanian Academy in two editions since 1913. Preliminary steps like scanning, optical character recognition, and pre-processing operations have already been done. The paper presents a prototype for the correction of the digital form of the dictionary. The numerous advantages of the digital thesaurus dictionary are discussed, as a basis for future work in Romanian lexicography and, more generally, in language processing. Key words: resources.
This paper presents a first step towards constructing the diachronic Romanian morphology. First, ... more This paper presents a first step towards constructing the diachronic Romanian morphology. First, the "deformation" of a word is introduced and a classification of such deformations is proposed. The conducted research aims at detecting deformations in the roots of inflectional words (nouns, adjectives and verbs). The algorithm we present uses two important resources: a morphological dictionary of the current Romanian language, which also models the inflectional paradigms of the language, and eDTLR – the digital version of the Romanian Thesaurus Dictionary. In eDTLR each title word has associated a set of citations extracted from the Romanian literature, each having attached the year of publication. The algorithm detects root deformations in words by comparing word forms of the current language with forms extracted from the eDTLR citations. For every root change, the deformed root is deducted and all the diachronic forms are inferred. Also, using the chronology of citations,...
The Doctoral Consortium at EUROLAN¬2015 will provide an opportunity for graduate students enrolle... more The Doctoral Consortium at EUROLAN¬2015 will provide an opportunity for graduate students enrolled in PhD studies in Computational Linguistics, Natural Language Processing and Semantic Web to present their current work and receive constructive feedback and guidance, both from the general audience of the Summer School and the invited lecturers. The opinions expressed freely and in a friendly atmosphere will help presenters to correct modeling errors or misconceptions in early phases of their PhD research, to exercise their final defending in the front of an international panel of experts, to enhance the evaluation and comparison of their results with the state¬-of¬-the¬-art, to find new ideas for continuing their investigations, to enhance presentation skills, and, not the least, as often has been the case in former meetings of this kind, to establish future collaboration coalitions. http://eurolan.info.uaic.ro/2015/events/doctoral-consortium/
Uploads
Papers by Gabriela Haja