To the extent that memorability is one of the poet’s chief (even if unconscious) concerns, poetic... more To the extent that memorability is one of the poet’s chief (even if unconscious) concerns, poetic composition may be seen as a kind of mnemonic “reverse engineering” that utilizes the very operating procedures of verbal memory. In this article, I focus on the similarities between the cognitive operations involved in the tip-of-the-tongue phenomenon (a frustrating failure to retrieve a known but temporarily unavailable word) and those involved in creating the anagram, a poetic device discovered by Ferdinand de Saussure, in which the phonemes of the important theme word of a poem are dispersed throughout the body of the poem, while the word itself remains unsaid. Both the retrieval of a word on the tip of one’s tongue and the (re)construction of an anagram involve sorting through the phonetic and semantic cues that hint at the absent target word. I suggest that these similarities may be due to the fact that both phenomena are subserved by a common cognitive mechanism: semantic and per...
Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с за... more Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с законами этого жанра, темно и загадочно. А во второй мы попытаемся его истолковать.
Introduction The publication of Pushkin's Boris Godunov gave rise to a heated polemic in the ... more Introduction The publication of Pushkin's Boris Godunov gave rise to a heated polemic in the criticism of the time. In May 1831 one of the first negative responses to the tragedy--and an especially severe one--appeared in the form of an anonymous pamphlet, On Alexander Pushkin's Boris Godunov, subtitled A Conversation between a Landowner Passing from Moscow through a Provincial Town and a Private Teacher of Russian Literature, Practicing in the Same. (1) Contrary to literary custom at the time, A Conversation did not appear in the pages of a journal, but came out in a separate edition from the printing house of Moscow University and was cleared for publication by the Moscow Censorship Committee. The characters in the pamphlet--a landowner, Petr Ivanovich, and a teacher of Russian literature, Ermil Sergeevich--engage in a discussion of the merits and demerits of Pushkin's latest literary production. The provincial teacher gives a critical reading of Boris Godunov to the Moscow landowner, who agrees, for the most part, with the teacher's vitriolic remarks. The general stylistic mode of A Conversation is one of parody or pastiche. The teacher, who does most of the talking, is a bit of a caricature; his turns of phrase are often grotesquely pedantic and he is not averse to parading his Latin on occasion. The laconic and straightforward landowner is running late and is therefore obliged to rush along his grandiloquent interlocutor. Although the characters are presented in a somewhat ironic vein, their criticisms are apparently meant to be taken at face value. The critical part of the pamphlet consists of what afterwards became the stock repertoire of judgments on the imperfections of Pushkin's tragedy. The play is disparaged for its lack of believable characters, for its vagueness of genre and lack of a coherent structure, for historical inaccuracies, and for various stylistic flaws. But along with these literary observations, A Conversation also contains some elements of political denunciation, namely, hints at Pushkin's political disloyalty and allegations of his lack of respect for monarchic ideals. Curiously, A Conversation appears to have been the first book (i.e., a separate edition rather than an article) ever published on Pushkin. It was also the subject of the first article (a short review) ever published by the young Vissarion Belinsky. But these two circumstances, as I will try to show, do not yet exhaust its significance for Russian literary history. The fact that the pamphlet was among the earliest critical responses to Boris Godunov, its unusual format, and, most importantly, the bluntness of its accusations, made A Conversation something of a reference point for subsequent criticism of the play. Reviewing the critical reception of Pushkin's tragedy, B. P. Gorodetskii notes: [TEXT NOT REPRODUCIBLE IN ASCII] (2) Indeed, the anti-Pushkin party greeted the pamphlet with approval and praised its author for his wittiness. "G. Z--ia," in the journal Garland (Girlanda), mentions the pamphlet favorably, with only a slight caveat: [TEXT NOT REPRODUCIBLE IN ASCII] (3) And Bestuzhev-Riumin in The Northern Mercury (Severnyi Merkurii) notes: "V ikh [Uchitelia i Pomeshchika] suzhdeniiakh my sperva ozhidali naiti mnogo provintsializma, no vmesto togo nashli mnogo stolichnogo ostroumiia." (4) At the other end of the literary spectrum, among Pushkin's partisans, A Conversation aroused righteous indignation. The young Belinsky, in his reviewing debut in The Leaflet (Listok), equates the anonymous author with the notorious graphomaniac Aleksandr Orlov and calls the pamphlet "idle schoolboy talk": [TEXT NOT REPRODUCIBLE IN ASCII] (5) Thus, many people noted and reacted to A Conversation, but almost no one, either in the polemics at the time or in subsequent literary scholarship, has suggested who its author might have been. (6) There seems to be a clue, however, in the passage from Belinsky. …
We propose to map out the History of European thought over last three centuries using as a proxy ... more We propose to map out the History of European thought over last three centuries using as a proxy the history of changes in 15 editions of Encyclopedia Britannica. Editors of each new edition had to build a new consensus on what to include and what to exclude, how much volume a subject deserves, and what are the relations between subjects. These decisions may be captured and analyzed by methods of natural language processing, network analysis, and information visualization, thus providing tools for identification and analysis of various historical trends within and across domains of knowledge, such as discussion of theories and ideas, evolution of concepts, growth of reputations and such.
This work seeks to analyze the dynamics of social or political conflict as it develops over time,... more This work seeks to analyze the dynamics of social or political conflict as it develops over time, using a combination of network-based and language-based measures of conflict intensity derived from social media data. Specifically, we look at the random-walk based measure of graph polarization, text-based sentiment analysis, and the corresponding shift in word meaning and use by the opposing sides. We analyze the interplay of these views of conflict using the Ukraine-Russian Maidan crisis as a case study.
This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Ru... more This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss’ kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts.
Despite the wealth of newly available digital materials, the scope of text-based inv stigations h... more Despite the wealth of newly available digital materials, the scope of text-based inv stigations has mostly been limited to either synchronous or short-term historical analysis. In this paper, we report o n the first stage of the project that focuses on tracking long-range historical change, specifically, on the history of ideas and concepts. Th e project’s aim is to map out the history of representation of knowledge in Europe over last three centuries using as a proxy the histor y of changes in historical editions of Encyclopedia Britannica. We describe a series of corpus-analytical tasks necessary for buildin g the analytical and comparative tools for historical analysis using scanned noisy text. In this first stage of the project, we focus specifically on the tools for tracking and visualizing the relative importance of people, interconnections between them, and the rise and fall of their re putations.
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism, 2017
This paper addresses the task of identifying the bias in news articles published during a politic... more This paper addresses the task of identifying the bias in news articles published during a political or social conflict. We create a silver-standard corpus based on the actions of users in social media. Specifically, we reconceptualize bias in terms of how likely a given article is to be shared or liked by each of the opposing sides. We apply our methodology to a dataset of links collected in relation to the Russia-Ukraine Maidan crisis from 2013-2014. We show that on the task of predicting which side is likely to prefer a given article, a Naive Bayes classifier can record 90.3% accuracy looking only at domain names of the news sources. The best accuracy of 93.5% is achieved by a feed forward neural network. We also apply our methodology to gold-labeled set of articles annotated for bias, where the aforementioned Naive Bayes classifier records 82.6% accuracy and a feed-forward neural networks records 85.6% accuracy.
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 2015
In this paper, we investigate the feasibility of using the chronology of changes in historical ed... more In this paper, we investigate the feasibility of using the chronology of changes in historical editions of Encyclopaedia Britannica (EB) to track the changes in the landscape of cultural knowledge, and specifically, the rise and fall in reputations of historical figures. We describe the dataprocessing pipeline we developed in order to identify the matching articles about historical figures in Wikipedia, the current electronic edition of Encyclopaedia Britannica (edition 15), and several digitized historical editions, namely, editions 3, 9, 11. We evaluate our results on the tasks of article segmentation and cross-edition matching using a manually annotated subset of 1000 articles from each edition. As a case study for the validity of discovered trends, we use the Wikipedia category of 18th century classical composers. We demonstrate that our data-driven method allows us to identify cases where a historical figure's reputation experiences a drastic fall or a dramatic recovery which would allow scholars to further investigate previously overlooked instances of such change.
Наша наука чаще всего занимается прошлым, настоящим реже, а будущим почти никогда. А Александру К... more Наша наука чаще всего занимается прошлым, настоящим реже, а будущим почти никогда. А Александру Константиновичу Жолковскому, как знают его коллеги, ученики и читатели, присуща интеллектуальная чуткость и неувядающий юношеский интерес к самоновейшему в культуре, к тому, что происходит прямо сейчас или вот-вот произойдет. Поэтому мы решили посвятить ему статью-предсказание о будущем филологии и литературоведения. Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с законами этого жанра, темно и загадочно. А во второй мы попытаемся его истолковать.
To the extent that memorability is one of the poet’s chief (even if uncon-scious) concerns, poeti... more To the extent that memorability is one of the poet’s chief (even if uncon-scious) concerns, poetic composition may be seen as a kind of mnemonic “reverse engineering ” that utilizes the very operating procedures of verbal memory. In this article, I focus on the similarities between the cognitive operations involved in the tip-of-the-tongue phenomenon (a frustrating failure to retrieve a known but tem-porarily unavailable word) and those involved in creating the anagram, a poetic device discovered by Ferdinand de Saussure, in which the phonemes of the impor-tant theme word of a poem are dispersed throughout the body of the poem, while the word itself remains unsaid. Both the retrieval of a word on the tip of one’s tongue and the (re)construction of an anagram involve sorting through the phonetic and semantic cues that hint at the absent target word. I suggest that these similarities may be due to the fact that both phenomena are subserved by a common cogni-tive mechanism: semantic an...
Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с за... more Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с законами этого жанра, темно и загадочно. А во второй мы попытаемся его истолковать.
To the extent that memorability is one of the poet’s chief (even if unconscious) concerns, poetic... more To the extent that memorability is one of the poet’s chief (even if unconscious) concerns, poetic composition may be seen as a kind of mnemonic “reverse engineering” that utilizes the very operating procedures of verbal memory. In this article, I focus on the similarities between the cognitive operations involved in the tip-of-the-tongue phenomenon (a frustrating failure to retrieve a known but temporarily unavailable word) and those involved in creating the anagram, a poetic device discovered by Ferdinand de Saussure, in which the phonemes of the important theme word of a poem are dispersed throughout the body of the poem, while the word itself remains unsaid. Both the retrieval of a word on the tip of one’s tongue and the (re)construction of an anagram involve sorting through the phonetic and semantic cues that hint at the absent target word. I suggest that these similarities may be due to the fact that both phenomena are subserved by a common cognitive mechanism: semantic and per...
Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с за... more Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с законами этого жанра, темно и загадочно. А во второй мы попытаемся его истолковать.
Introduction The publication of Pushkin's Boris Godunov gave rise to a heated polemic in the ... more Introduction The publication of Pushkin's Boris Godunov gave rise to a heated polemic in the criticism of the time. In May 1831 one of the first negative responses to the tragedy--and an especially severe one--appeared in the form of an anonymous pamphlet, On Alexander Pushkin's Boris Godunov, subtitled A Conversation between a Landowner Passing from Moscow through a Provincial Town and a Private Teacher of Russian Literature, Practicing in the Same. (1) Contrary to literary custom at the time, A Conversation did not appear in the pages of a journal, but came out in a separate edition from the printing house of Moscow University and was cleared for publication by the Moscow Censorship Committee. The characters in the pamphlet--a landowner, Petr Ivanovich, and a teacher of Russian literature, Ermil Sergeevich--engage in a discussion of the merits and demerits of Pushkin's latest literary production. The provincial teacher gives a critical reading of Boris Godunov to the Moscow landowner, who agrees, for the most part, with the teacher's vitriolic remarks. The general stylistic mode of A Conversation is one of parody or pastiche. The teacher, who does most of the talking, is a bit of a caricature; his turns of phrase are often grotesquely pedantic and he is not averse to parading his Latin on occasion. The laconic and straightforward landowner is running late and is therefore obliged to rush along his grandiloquent interlocutor. Although the characters are presented in a somewhat ironic vein, their criticisms are apparently meant to be taken at face value. The critical part of the pamphlet consists of what afterwards became the stock repertoire of judgments on the imperfections of Pushkin's tragedy. The play is disparaged for its lack of believable characters, for its vagueness of genre and lack of a coherent structure, for historical inaccuracies, and for various stylistic flaws. But along with these literary observations, A Conversation also contains some elements of political denunciation, namely, hints at Pushkin's political disloyalty and allegations of his lack of respect for monarchic ideals. Curiously, A Conversation appears to have been the first book (i.e., a separate edition rather than an article) ever published on Pushkin. It was also the subject of the first article (a short review) ever published by the young Vissarion Belinsky. But these two circumstances, as I will try to show, do not yet exhaust its significance for Russian literary history. The fact that the pamphlet was among the earliest critical responses to Boris Godunov, its unusual format, and, most importantly, the bluntness of its accusations, made A Conversation something of a reference point for subsequent criticism of the play. Reviewing the critical reception of Pushkin's tragedy, B. P. Gorodetskii notes: [TEXT NOT REPRODUCIBLE IN ASCII] (2) Indeed, the anti-Pushkin party greeted the pamphlet with approval and praised its author for his wittiness. "G. Z--ia," in the journal Garland (Girlanda), mentions the pamphlet favorably, with only a slight caveat: [TEXT NOT REPRODUCIBLE IN ASCII] (3) And Bestuzhev-Riumin in The Northern Mercury (Severnyi Merkurii) notes: "V ikh [Uchitelia i Pomeshchika] suzhdeniiakh my sperva ozhidali naiti mnogo provintsializma, no vmesto togo nashli mnogo stolichnogo ostroumiia." (4) At the other end of the literary spectrum, among Pushkin's partisans, A Conversation aroused righteous indignation. The young Belinsky, in his reviewing debut in The Leaflet (Listok), equates the anonymous author with the notorious graphomaniac Aleksandr Orlov and calls the pamphlet "idle schoolboy talk": [TEXT NOT REPRODUCIBLE IN ASCII] (5) Thus, many people noted and reacted to A Conversation, but almost no one, either in the polemics at the time or in subsequent literary scholarship, has suggested who its author might have been. (6) There seems to be a clue, however, in the passage from Belinsky. …
We propose to map out the History of European thought over last three centuries using as a proxy ... more We propose to map out the History of European thought over last three centuries using as a proxy the history of changes in 15 editions of Encyclopedia Britannica. Editors of each new edition had to build a new consensus on what to include and what to exclude, how much volume a subject deserves, and what are the relations between subjects. These decisions may be captured and analyzed by methods of natural language processing, network analysis, and information visualization, thus providing tools for identification and analysis of various historical trends within and across domains of knowledge, such as discussion of theories and ideas, evolution of concepts, growth of reputations and such.
This work seeks to analyze the dynamics of social or political conflict as it develops over time,... more This work seeks to analyze the dynamics of social or political conflict as it develops over time, using a combination of network-based and language-based measures of conflict intensity derived from social media data. Specifically, we look at the random-walk based measure of graph polarization, text-based sentiment analysis, and the corresponding shift in word meaning and use by the opposing sides. We analyze the interplay of these views of conflict using the Ukraine-Russian Maidan crisis as a case study.
This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Ru... more This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss’ kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts.
Despite the wealth of newly available digital materials, the scope of text-based inv stigations h... more Despite the wealth of newly available digital materials, the scope of text-based inv stigations has mostly been limited to either synchronous or short-term historical analysis. In this paper, we report o n the first stage of the project that focuses on tracking long-range historical change, specifically, on the history of ideas and concepts. Th e project’s aim is to map out the history of representation of knowledge in Europe over last three centuries using as a proxy the histor y of changes in historical editions of Encyclopedia Britannica. We describe a series of corpus-analytical tasks necessary for buildin g the analytical and comparative tools for historical analysis using scanned noisy text. In this first stage of the project, we focus specifically on the tools for tracking and visualizing the relative importance of people, interconnections between them, and the rise and fall of their re putations.
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism, 2017
This paper addresses the task of identifying the bias in news articles published during a politic... more This paper addresses the task of identifying the bias in news articles published during a political or social conflict. We create a silver-standard corpus based on the actions of users in social media. Specifically, we reconceptualize bias in terms of how likely a given article is to be shared or liked by each of the opposing sides. We apply our methodology to a dataset of links collected in relation to the Russia-Ukraine Maidan crisis from 2013-2014. We show that on the task of predicting which side is likely to prefer a given article, a Naive Bayes classifier can record 90.3% accuracy looking only at domain names of the news sources. The best accuracy of 93.5% is achieved by a feed forward neural network. We also apply our methodology to gold-labeled set of articles annotated for bias, where the aforementioned Naive Bayes classifier records 82.6% accuracy and a feed-forward neural networks records 85.6% accuracy.
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 2015
In this paper, we investigate the feasibility of using the chronology of changes in historical ed... more In this paper, we investigate the feasibility of using the chronology of changes in historical editions of Encyclopaedia Britannica (EB) to track the changes in the landscape of cultural knowledge, and specifically, the rise and fall in reputations of historical figures. We describe the dataprocessing pipeline we developed in order to identify the matching articles about historical figures in Wikipedia, the current electronic edition of Encyclopaedia Britannica (edition 15), and several digitized historical editions, namely, editions 3, 9, 11. We evaluate our results on the tasks of article segmentation and cross-edition matching using a manually annotated subset of 1000 articles from each edition. As a case study for the validity of discovered trends, we use the Wikipedia category of 18th century classical composers. We demonstrate that our data-driven method allows us to identify cases where a historical figure's reputation experiences a drastic fall or a dramatic recovery which would allow scholars to further investigate previously overlooked instances of such change.
Наша наука чаще всего занимается прошлым, настоящим реже, а будущим почти никогда. А Александру К... more Наша наука чаще всего занимается прошлым, настоящим реже, а будущим почти никогда. А Александру Константиновичу Жолковскому, как знают его коллеги, ученики и читатели, присуща интеллектуальная чуткость и неувядающий юношеский интерес к самоновейшему в культуре, к тому, что происходит прямо сейчас или вот-вот произойдет. Поэтому мы решили посвятить ему статью-предсказание о будущем филологии и литературоведения. Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с законами этого жанра, темно и загадочно. А во второй мы попытаемся его истолковать.
To the extent that memorability is one of the poet’s chief (even if uncon-scious) concerns, poeti... more To the extent that memorability is one of the poet’s chief (even if uncon-scious) concerns, poetic composition may be seen as a kind of mnemonic “reverse engineering ” that utilizes the very operating procedures of verbal memory. In this article, I focus on the similarities between the cognitive operations involved in the tip-of-the-tongue phenomenon (a frustrating failure to retrieve a known but tem-porarily unavailable word) and those involved in creating the anagram, a poetic device discovered by Ferdinand de Saussure, in which the phonemes of the impor-tant theme word of a poem are dispersed throughout the body of the poem, while the word itself remains unsaid. Both the retrieval of a word on the tip of one’s tongue and the (re)construction of an anagram involve sorting through the phonetic and semantic cues that hint at the absent target word. I suggest that these similarities may be due to the fact that both phenomena are subserved by a common cogni-tive mechanism: semantic an...
Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с за... more Статья состоит из двух частей: в первой содержится наше пророчество, которое, в соответствие с законами этого жанра, темно и загадочно. А во второй мы попытаемся его истолковать.
Papers by Mikhail Gronas