Papers by Natalia Perkova
Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), 2022
Latvian Romani is a Northeastern Romani dialect with a limited number of publicly available sourc... more Latvian Romani is a Northeastern Romani dialect with a limited number of publicly available sources. Two large archival collections of texts in Latvian Romani, compiled primarily in the 1930s in Latvia and Estonia, have been recently digitized as images and made available online for a wider public. In our study, we focus on one of these collections, the Latvian Romani folklore texts collected by Jānis Leimanis in interwar Latvia. In this paper, we describe how initial manual transcriptions, most of which have been created with the help of a special crowdsourcing platform, were integrated in the handwritten text recognition (HTR) workflow in Transkribus. We present two HTR models trained on the basis of Leimanis' collection and discuss various issues related to the work on these texts.
Proceedings of the Digital Humanities in the Nordic Countries 4th Conference Copenhagen, Denmark, March 5-8, 2019., 2019
The paper presents parallel corpora within the Russian National Corpus (RNC) featuring Circum-Bal... more The paper presents parallel corpora within the Russian National Corpus (RNC) featuring Circum-Baltic/Russian language pairs and describes the choice of texts, morphological annotation and possible applications. The following languages of the Circum-Baltic linguistic area are included into the bilingual pairs of the corpus: Estonian, Finnish, Latvian, Lithuanian, Polish, and Swedish. The corpus includes both fiction and non-fiction texts and has a diachronic dimension. The morphological annotation of different languages is sensitive for language-specific categories and features. For each language an expanded RNC tagset is constructed which provides cross-linguistic comparison but at the same time takes into consideration differences in grammatical systems. The corpora can be used for exploring some grammatical and lexical features for the Circum-Baltic region that have no straightforward correspondence in Russian and are often rendered by other means. Further expansion of the corpus by non-fiction genres is particularly important for the study of lexicon and syntax specific for legalese, media or academic style.
Валентностные классы двухместных предикатов в разноструктурных языках / Отв. ред. С.С. Сай. СПб.:... more Валентностные классы двухместных предикатов в разноструктурных языках / Отв. ред. С.С. Сай. СПб.: ИЛИ РАН, 2018. 211-224.
This paper presents the current status of the Latvian-Russian parallel corpus, which is an ongoin... more This paper presents the current status of the Latvian-Russian parallel corpus, which is an ongoing project within the Russian National Corpus. It discusses the existing parallel corpora including Latvian texts, availability of sources and the main principles and tools of alignment and morphological annotation, as well as further plans for developing the corpus.
Koptjevskaja-Tamm, M. (ed.) The Linguistics of Temperature. John Benjamins., 2015
This study examines the system of terms used to describe temperature in Latvian, with special foc... more This study examines the system of terms used to describe temperature in Latvian, with special focus on temperature adjectives as its core. The main aim of the research is to understand how the domain of temperature is conceptualised in Latvian. The semantics and distribution of eleven adjectives are analysed from different points of view in line with a lexical typological approach. The study shows that the system of Latvian basic temperature terms can be revised and re-evaluated as consisting of four terms rather than three (cf. Sutrop 1999). Some aspects of semantic shifts and regular metaphorical patterns in the relevant domain are discussed as well.
С СОБОЙ в русском языке: комитативные конструкции каузации перемещения и их свойства, 2014
Стокгольмский университет, Стокгольм С СОБОЙ В РУССКОМ ЯЗЫКЕ: КОМИТАТИВНЫЕ КОНСТРУКЦИИ КАУЗАЦИИ П... more Стокгольмский университет, Стокгольм С СОБОЙ В РУССКОМ ЯЗЫКЕ: КОМИТАТИВНЫЕ КОНСТРУКЦИИ КАУЗАЦИИ ПЕРЕМЕЩЕНИЯ И ИХ СВОЙСТВА
Review on Multiple Perspectives in Linguistic Research on Baltic Languages, 2012
Conference Presentations by Natalia Perkova
Talk at the International conference "Caritive Constructions in the Languages of the World," Inst... more Talk at the International conference "Caritive Constructions in the Languages of the World," Institute of Linguistic Studies of the Russian Academy of Sciences, Saint-Petersburg (online), November 30 - December 2, 2020.
The project website: https://www.caritive.org/
A comitative construction (after Arkhipov 2009) is an asymmetric noun phrase conjunction strategy... more A comitative construction (after Arkhipov 2009) is an asymmetric noun phrase conjunction strategy that has a quadripartite structure.
presented at the workshop "Grammar of non-standard varieties in the East of the Circum-Baltic are... more presented at the workshop "Grammar of non-standard varieties in the East of the Circum-Baltic area" (University of Tartu, February 1-3, 2018)
Corpus is a linguistically (word-by-word) annotated digital collection of texts • Corpora are com... more Corpus is a linguistically (word-by-word) annotated digital collection of texts • Corpora are compiled by linguists and computer scientists and are used in linguistic, literature and cultural studies, copyediting, L1 and L2 teaching, machine learning for different linguistic software (including speech recognition, machine translation etc.)
presented at the 47th SLE conference (Poznań, 2014)
Abstracts by Natalia Perkova
Drafts by Natalia Perkova
This paper is an unpublished manuscript written for the volume "Grammatical Relations and their N... more This paper is an unpublished manuscript written for the volume "Grammatical Relations and their Non-Canonical Encoding in Baltic" (ed. Axel Holvoet and Nicole Nau), 2014 and supported by the project "Valency, Argument Realization and Grammatical Relations in Baltic". The study is based on the questionnaire of another project, run by the Institute for Linguistic Studies in Saint-Petersburg, Russia, see Say, Sergey. 2014. Bivalent verb classes in the languages of Europe. Language Dynamics and Change 4(1): 116–166.
The present English manuscript can be seen as an extended (though also somewhat outdated and lacking some useful references) version of my paper in Russian (Perkova, forthcoming) to be published in the volume summarizing the results of the latter project.
The Lithuanian data come from Natalia Zaika, who collected them separately for the same project.
Uploads
Papers by Natalia Perkova
Conference Presentations by Natalia Perkova
The project website: https://www.caritive.org/
Abstracts by Natalia Perkova
Drafts by Natalia Perkova
The present English manuscript can be seen as an extended (though also somewhat outdated and lacking some useful references) version of my paper in Russian (Perkova, forthcoming) to be published in the volume summarizing the results of the latter project.
The Lithuanian data come from Natalia Zaika, who collected them separately for the same project.
The project website: https://www.caritive.org/
The present English manuscript can be seen as an extended (though also somewhat outdated and lacking some useful references) version of my paper in Russian (Perkova, forthcoming) to be published in the volume summarizing the results of the latter project.
The Lithuanian data come from Natalia Zaika, who collected them separately for the same project.