Topic Models
Recent papers in Topic Models
An important text mining problem is to find, in a large collection of texts, documents related to specific topics and then discern further structure among the found texts. This problem is especially important for social sciences, where... more
In this article we will apply various statistical analyses with the R programming language (word frequency, bigrams, word co-occurrence and Topic Modelling) to the bibliographical reviews of 1932-1933 from the Spanish journal Índice... more
Station : Madurai Date : Submitted for VIVA-VOCE Examination held at Thiagarajar College of Engineering, Madurai on ______________ . INTERNAL EXAMINER EXTERNAL EXAMINER iii ACKNOWLEDGEMENT We express our sincere gratitude to our Honorable... more
Text classification typically performs best with large training sets, but short texts are very common on the World Wide Web. Can we use resampling and data augmentation to construct larger texts using similar terms? Several current... more
The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new... more
Legal scholars study international courts by analyzing only a fraction of available material, which leaves doubts as to whether their accounts correctly capture the dynamics of international law. In this paper we use dynamic topic... more
This is the definition given by New York Times bestselling author Chuck Martin for Artificial Intelligence. To be precise, these devices are learning from the consumers who apply them. It was predicted that 2018 will see this learning... more
The aim of this chapter is to show some basic methods using R to analyze text content to discover emergent issues and controversies in diverse corpora. As a specific case study, I investigate the culture of microblogging academics... more
The goal of this teaching material is to provide a better understanding of the concept of digital humanities from various viewpoints of distinguished scholars in the field, identify the characteristic features of digital humanities... more
Recent discussions in the field of Linguistic Landscapes (LL) have emphasized the importance of acknowledging LL 'actors' and their role in interpreting or interacting with language in place (Lou 2016; Banda & Jimaima 2015; Barni & Bagna... more
Questa tesi doveva essere presentata per la discussione nella sessione di laurea magistrale in "Sociologia e ricerca sociale" del novembre 2021. A causa del comportamento straordiariamente scorretto della Prof.ssa Bracciale e del Dott.... more
Topic modeling has become very popular in digital humanities. It’s easy and very powerful method to get an overview of the contents of large textual collections. This made it very appealing for humanists. But applying topic models in such... more
Abstract-The classification of the emotions contained in the social media is of great importance in terms of its use in related fields such as media as well as developing technology. The Latent Dirichlet Allocation (LDA), a topic modeling... more
We provide a brief, non-technical introduction to the text mining methodology known as topic modeling. We summarize the theory and background of the method and discuss just what kinds of things are found by topic models. Using a text... more
Sociolinguistic research has predominately relied on spoken language to understand how social structures influence and are influenced by communication and interaction. This dissertation, however, turns to the increasingly prevalent and... more
Recent corpus techniques ask literary analysts to bracket the interpretation of meaning so that we may trace the motions of mind. These techniques allow us to think of the mind as being, in some aspect, a high-dimensional space of verbal... more
Topic modeling provides a valuable method for identifying the linguistic contexts that surround social institutions or policy domains. This article uses Latent Dirichlet Allocation (LDA) to analyze how one such policy domain, government... more
Simple exploratory text mining and document clustering of journal articles from JSTOR’s Data for Research service. Go to, make a request for data (specify CSV as outout format and Word Counts as data type), then once... more
The concept of "life" certainly is of some use to distinguish birds and beavers from water and stones. This pragmatic usefulness has led to its construal as a categorical predicate that can sift out living entities from non-living ones... more
Legitimacy is a crucial factor determining the success of technologies in the early stages of development and for maintaining resource flows as well as public and political support across the technology life cycle. In sustainability... more
Sociological self-understanding is that the frequency of economic topics in sociology has peaked twice: first during the classical era between 1890 and 1920 and second after Mark Granovetter’s often cited 1985 article. This paper tests... more
I present an in-detail introduction to Topic Models (TM), a family of probabilistic models for (mainly) document modeling. I introduce and motivate the model, and illustrate its applications in Natural Language Processing (NLP), with the... more
Political campaigns mostly run parallel to each other during an election cycle, but intersect when the main candidates face off for televised debates. They offer supporters of these candidates a chance to engage with each other while... more
Los orígenes de la concepción peyorativa de lo «medieval» hay que rastrearlos en el Renacimiento, cuando, con la división del tiempo histórico, la Edad Media fue vista como una etapa «bárbara» en contraposición del esplendor cultural... more
В статье на основе данных письменных источников (археологический материал не привлекается) выясняется вопрос о времени наступлении старости в Средневековой Руси. Выясняется, что старость наступала между 50 и 60 годами. Лица, прожившие 70,... more
In this paper the problem of performing external validation of the semantic coherence of topic models is considered. The Fowlkes-Mallows index, a known clustering validation metric, is generalized for the case of overlapping partitions... more
Resumen: Las populares plataformas Git albergan proyectos de software de gran escala, conteniendo grandes volúmenes de código fuente que son difíciles de entender en tareas de mantenimiento. El entendimiento de código fuente es... more
This is the project report of the Network Institute project " Do you see what I am talking about? " , which is a followup of the earlier project " Polemics Visualized " .
The aim of this article is to analyze the discursive ackground for the characters of teachers in the Soviet school story of the afterwar period. The 1,8 million words corpus for the study as compiled of the novels about school and... more
Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in... more
Since 2015 there has been a surge of academic publications and citations focused on consumer food waste. To introduce a special issue of Appetite focused on the drivers of consumer food waste we perform a transdisciplinary and historical... more
Cikkünk a TANIT (Text ANalysIs Tools) rendszer célkitűzését, funkcióit és használatát mutatja be. A TANIT rendszer célja, hogy magyar nyelvű szövegek számítógépes nyelvészeti feldolgozásával dokumentumok összehasonlító elemzéséhez... more
Despite being a relatively new discipline, Chinese Interpreting Studies (CIS) has witnessed tremendous growth in the number of publications and diversity of topics investigated over the past two decades. The number of doctoral... more
Can "distant reading" and digital tools enhance the history of technology by revealing hitherto undetected patterns in the record? Using the parliamentary debates of Britain in the nineteenth century, this essay revisits the history of... more
The typological distinction between pilgrims and tourists has often been drawn in tourism studies. This article aims at complementing this debate by applying computational techniques to analyse discourses in a corpus of blogs from the... more
In this paper focus is on developing a hashtag recommendation system for an online social network application with a Peer-to-Peer infrastructure motivated by BestPeer++ architecture and BATON overlay structure. A user may invoke a... more
This study introduces a comparative approach to study user comments on the same news content across online platforms while distinguishing between soft and hard news genres. Empirical analysis focuses on Israel’s popular news website Ynet.... more
With the growth of the internet, short texts such as tweets from Twitter, news titles from the RSS, or comments from Amazon have become very prevalent. Many tasks need to retrieve information hidden from the content of short texts. So... more