Papers by Nedjma OUSIDHOUM
arXiv (Cornell University), May 16, 2024
Trigger points are a concept introduced by Mau et al. [30] to study qualitative focus group inter... more Trigger points are a concept introduced by Mau et al. [30] to study qualitative focus group interviews and understand polarisation in Germany. When people communicate, trigger points represent moments when individuals feel that their understanding of what is fair, normal, or appropriate in society is questioned. In the original studies, individuals show strong and negative emotional responses when certain triggering words or topics are mentioned. In this paper, we introduce the first systematic study of the large-scale effect of individual words as trigger points by analyzing a large amount of social media posts. We examine online deliberations on Reddit between 2020 and 2022 and collect >100 million posts from subreddits related to a set of words identified as trigger points in UK politics. We find that such trigger words affect user engagement and have noticeable consequences how online discussions occur. Trigger words cause animosity and incentivise hate speech, adversarial debates, and disagreements. Our work is the first to introduce trigger points to computational studies of online communication. The findings are relevant to researchers interested in affective computing, online deliberation, and those examining how citizens debate politics and society in light of affective polarisation. CCS Concepts • Information systems → World Wide Web; • Computing methodologies → Natural language processing.
arXiv (Cornell University), Mar 27, 2024
We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared task... more We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages:
Automated fact-checking is often presented as an epistemic tool that fact-checkers, social media ... more Automated fact-checking is often presented as an epistemic tool that fact-checkers, social media consumers, and other stakeholders can use to fight misinformation. Nevertheless, few papers thoroughly discuss how. We document this by analysing 100 highly-cited papers, and annotating epistemic elements related to intended use, i.e., means, ends, and stakeholders. We find that narratives leaving out some of these aspects are common, that many papers propose inconsistent means and ends, and that the feasibility of suggested strategies rarely has empirical backing. We argue that this vagueness actively hinders the technology from reaching its goals, as it encourages overclaiming, limits criticism, and prevents stakeholder feedback. Accordingly, we provide several recommendations for thinking and writing about the use of fact-checking artefacts. Automated fact-checking artefacts are no different. The majority of papers envision them as epistemic tools to limit misinformation. In our analysis, we find that 82 out of 100 automated fact-checking papers are motivated as such. Unfortunately, many papers only discuss how this will be achieved in vague terms-authors argue that automated factchecking will be used against misinformation, but not how or by whom (see Figure 1). Connecting research to potential use allows researchers to shape their work in ways that take into account the expressed needs of key stakeholders, such as professional fact-checkers (Nakov et al., 2021). It also enables critical work (Haraway, 1988), and facilitates thinking about unintended shortcomings, e.g., dual use (Leins et al., 2020;
Fact-checking requires retrieving evidence related to a claim under investigation. The task can b... more Fact-checking requires retrieving evidence related to a claim under investigation. The task can be formulated as question generation based on a claim, followed by question answering. However, recent question generation approaches assume that the answer is known and typically contained in a passage given as input, whereas such passages are what is being sought when verifying a claim. In this paper, we present Varifocal, a method that generates questions based on different focal points within a given claim, i.e. different spans of the claim and its metadata, such as its source and date. Our method outperforms previous work on a fact-checking question generation dataset on a wide range of automatic evaluation metrics. These results are corroborated by our manual evaluation, which indicates that our method generates more relevant and informative questions. We further demonstrate the potential of focal points in generating sets of clarification questions for product descriptions.
Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023)
arXiv (Cornell University), Apr 27, 2023
Automated fact-checking is often presented as an epistemic tool that fact-checkers, social media ... more Automated fact-checking is often presented as an epistemic tool that fact-checkers, social media consumers, and other stakeholders can use to fight misinformation. Nevertheless, few papers thoroughly discuss how. We document this by analysing 100 highly-cited papers, and annotating epistemic elements related to intended use, i.e., means, ends, and stakeholders. We find that narratives leaving out some of these aspects are common, that many papers propose inconsistent means and ends, and that the feasibility of suggested strategies rarely has empirical backing. We argue that this vagueness actively hinders the technology from reaching its goals, as it encourages overclaiming, limits criticism, and prevents stakeholder feedback. Accordingly, we provide several recommendations for thinking and writing about the use of fact-checking artefacts.
arXiv (Cornell University), Apr 13, 2023
arXiv (Cornell University), Feb 17, 2023
Cornell University - arXiv, Oct 22, 2022
Fact-checking requires retrieving evidence related to a claim under investigation. The task can b... more Fact-checking requires retrieving evidence related to a claim under investigation. The task can be formulated as question generation based on a claim, followed by question answering. However, recent question generation approaches assume that the answer is known and typically contained in a passage given as input, whereas such passages are what is being sought when verifying a claim. In this paper, we present Varifocal, a method that generates questions based on different focal points within a given claim, i.e. different spans of the claim and its metadata, such as its source and date. Our method outperforms previous work on a fact-checking question generation dataset on a wide range of automatic evaluation metrics. These results are corroborated by our manual evaluation, which indicates that our method generates more relevant and informative questions. We further demonstrate the potential of focal points in generating sets of clarification questions for product descriptions.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Large pre-trained language models (PTLMs) have been shown to carry biases towards different socia... more Large pre-trained language models (PTLMs) have been shown to carry biases towards different social groups which leads to the reproduction of stereotypical and toxic content by major NLP systems. We propose a method based on logistic regression classifiers to probe English, French, and Arabic PTLMs and quantify the potentially harmful content that they convey with respect to a set of templates. The templates are prompted by a name of a social group followed by a cause-effect relation. We use PTLMs to predict masked tokens at the end of a sentence in order to examine how likely they enable toxicity towards specific communities. We shed the light on how such negative content can be triggered within unrelated and benign contexts based on evidence from a large-scale study, then we explain how to take advantage of our methodology to assess and mitigate the toxicity transmitted by PTLMs.
Search algorithms play an indispensable role in information retrieval. In this paper, we attempt ... more Search algorithms play an indispensable role in information retrieval. In this paper, we attempt to define a new useful indexing and search algorithm for Classical Arabic language using Arabic phonetics’ specificities. Our algorithm is based on the classical Soundex algorithm. In fact, we investigate Classical Arabic phonetic features toward words’ phonemes adequate representation. Thus, the algorithm proposed would create a suitable environment for a relevant information retrieval. It would be useful in creating phonetic dictionaries and in coding queries for an effective preprocessing in a spelling corrector.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Lecture Notes in Computer Science, 2013
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Uploads
Papers by Nedjma OUSIDHOUM