Fact Check and Social Media Analysis
Fact Check and Social Media Analysis
Fact Check and Social Media Analysis
Pipeline
Oana Balalau, Pablo Bertaud-Velten, Younes El-Fraihi, Garima Gaur, Oana
Goga, Samuel Guimaraes, Ioana Manolescu, Brahim Saadi
comprehensively test information retrieval techniques proposed in 3.1 Inputs: fact-checks and claims
the literature, on a variety of datasets. Our dataset also contributes Conceptually, there are two main entities: fact-checks, and claims.
to the FCR task, in particular in the coverage of French. In principle, a claim can be a social media post, an image speci-
fying a claim, or a simple text phrase. In the corpus we built for
2 Related Work our demo (detailed in Section 4), claims are tweets. Therefore, our
The need to reuse FCs when analyzing new statements has been claim is characterized by its accountHandle (the Twitter account
recognized early on [3]. Accordingly, techniques emerged for re- having published the tweet), text, date, language, hashTags, and
trieving the fact-checked claims most relevant for a given query URLtoEmbeddedMediaContent. The attributes of fact-check are
(tweet or claim) [2, 4, 15, 17, 20]. All these adopt the standard (re- derived from the ClaimReview schema that many FC agencies adopt
triever, reranker) architecture proposed for information retrieval in their FC articles (https://schema.org/ClaimReview); ClaimReview
tasks. A retriever is used to retrieve efficiently, for a given input, was promoted by Google, which used to show, next to search re-
from a potentially very large FCs, a subset considered to closest sults, related FCs4 . Specifically, the attributes of an FC are: title,
to it; then, a second, potentially more expensive method is used to claimant, publisher, dateOfPublication, URLtoArticle, claimText, lan-
re-rank the retrieved results. For both stages, there are term-based guage and rating. The relationship between these two key entities
(probabilistic) models, and neural methods. Among the probabilistic is captured by a many-to-many relation claimAboutFC(claim-id,
retrievers, BM25 [16] is the most frequent, and performed well in FC-id): a claim and an FC are paired in this way, if (according to a
many studies, e.g., [4, 17]. Neural retrievers and re-rankers are often specific automated or manual decision method) the FC is about the
based on Transformer networks, which capture matching signals claim. We also say the claim and FC are aligned.
between words in the FC and the query. In Table 1, we present
the datasets and ranking methods used in prior works. We note 3.2 Users’ interactions with FactCheckBureau
that there are slight discrepancies on the best performing models.
The core task to be solved in FC retrieval pipelines is: given a claim
This could be due to the different evaluation settings, hence the
(also called query) and an FC corpus, find the FCs most relevant
importance of a unified evaluation and the relevance of our system.
to the claim. Specifically, each FC retrieval pipeline contains a
Table 1: Partial snapshot of the front-runners in the area, candidate FC retrieval module and a candidate ranking module. If the
adapted from [13] claim is text, it can be used as such, but other formats may require
Dataset Type Languages Source # pairs Evaluation Best Model some pre-processing, based on the query type, before being entered
Claim- X, Snopes, MRR BM25 (Top50)
[17] Tweet en PolitiFact, 1,768 MAP@k + Sentence into the text-based retrieval pipeline. For instance, if the claim is an
Pairs ClaimsKG Precision@k BERT image, the text needs to be extracted by OCR, or the image can be
Accuracy
Claim en,hi,bn, 2,343 Precision captioned; if the claim is a tweet, pre-processing may remove or split
[7] Whatsapp I XLM-R
Pairs ml,ta 398 (en) Recall hashtags in individual words, normalize numerical data, transcribe
F1 score
Accuracy emojis to text, etc. To be comprehensive, FactCheckBureau models
FC-
en,hi, X, 6,533 F1 score Full Length FC retrieval pipelines as consisting of three stages: pre-processing,
[8] Tweet
es,pt GFC Tools 4,850 (en) MRR BM25
Pairs retrieval, and ranking.
MAP@k
Claim- 26,048 Adapted
[12] Thread
41 X,
42,88% (en) F1 score GraphSAGE
FactCheckBureau has two main operation modes: deploy-
(en,fr,...) GFC tools
Pairs 3.46% (fr) model ment and development, shown in Figure 1, where dark navy
Claim- X, Snopes, MRR Sentence
[11] Tweet ar,en AraFacts,
2,518
MAP@k T5 +
modules are used in deployment, whereas in development, all the
1,610 (en)
Pairs ClaimsKG Hit@k GPT Neo modules (both navy and light blue) may be involved. In develop-
FC- 31,305
[15] Post
27 X, Meta,
7,307 (en) Hit@k
GTR-T5 (en) ment mode, it supports designing, inspecting, and comparing FC
(en,fr,...) GFC tools MPNet
Pairs 2,146 (fr) retrieval pipelines; in deployment mode, a retrieval pipeline can be
X BM25
FC-
en,hi, Boomlive, 1,600 MRR (Top 200)+ deployed and used to query the FC corpus. As explained below, our
[18] Tweet
Pairs
es,pt AFP,EFE 400 (en) MAP@k LaBSE demonstration will showcase the four use cases (design, inspect,
PolitiFact (BERT)
Claim- MRR BM25 compare, and deploy).
[17]
[4] Tweet en 1,000 MAP@k (Top 100)+
(Snopes)
Pairs Hit@k BERT Inspect. A user builds a retrieval pipeline by choosing or load-
Claim normalization is used to improve the linguistic quality of ing: pre-processing modules; a retrieval module; and a ranking
claims and to help retrieve FCs [19]. Other related lines of work module. The user also supplies aligned pairs, and chooses the met-
focus on identifying check-worthy claims [1]. ClaimKG [6] and ric(s) to use to evaluate the quality of the pipeline. Since relevant
CimpleKG [14] are corpora of fact-checks together with associated FC retrieval is a ranked-list search problem, we support the famil-
claims; the recent [15] is multilingual. iar Mean Average Precision (MAP@𝑘), Mean Reciprocal Ranking
In this context, the interest of FactCheckBureau is to enable (MRR@𝑘), Normalized Discounted Cummulative Gain (NDCG@𝑘),
researchers and technicians working in fact-checking organizations and Hits@𝑘. The user triggers the evaluation of the pipeline (top
to build and personalize their pipelines, experiment, and analyze snippet in Figure 2); for each query, this leads to a list of FCs, or-
with different modules. dered by their relevance, the former as computed by the pipeline.
3 FactCheckBureau at a glance
4 That
had been discontinued, among others, because some of the shown FC were not
We describe the data sources we work on (Sec. 3.1), and the user semantically close enough to the respective search results [9]. This highlights the
interaction modes with our tool (Sec. 3.2). importance of the FC retrieval problem.
FactCheckBureau: Build Your Own Fact-Check Analysis Pipeline CIKM ’24, October 21–25, 2024, Boise, ID, USA
Figure 1: FactCheckBureau architecture in the two use modes (Development and Deployment)
FactCheckBureau also presents the values of the chosen metric(s) • Output: Querying interface – querying through post, image,
for different cut-off values 𝑘. text, or topic (will use FC tags available).
For further inspection, a user can choose the deep-dive option,
where FactCheckBureau enables to inspect test samples where the 4 Dataset and FC pipeline
specified pipeline performed poorly. It also reports the performance Dataset. We built a corpus of 218𝐾 claim reviews in 14 languages
of each model in the pipeline in isolation. published by 83 fact-checking agencies recognized as verified sig-
• Input: Chose models, aligned pairs, metric(s); natories by IFCN (International Fact-Checking Network). Further,
• Output: Computed metric (plots and tables), the perfor- we have collected 9.1𝐾 tweets mentioned in various FC articles
mance of candidate selector only, identify top 5 badly per- and 8𝐾 recent tweets from prominent Members of the European
forming examples. Parliament.
We used Google Fact-Check API 5 for collecting FCs. The data
Compare. While developing a retrieval solution, evaluating and
returned follows the ClaimReview schema but with some fields
comparing different models is essential to find the best possible com-
omitted as described in Google’s documentation 6 . The returned
bination of models for candidate retrieval and ranking. The compare
data also includes the URL to the FC article. We also collected
mode (middle in Figure 2) enables a user to compare previously
the FC article text using these URLs to enrich our dataset. Some
saved pipelines and/or newly specified ones. The user obtains a con-
of the reputed FC agencies, like Le Monde, do not publish their
solidated performance report comparing all the specified pipelines
FCs via Google FC Explorer, or stopped at some point, therefore,
under a set of chosen metrics. It also provides a deep-dive option,
we crawled their web pages to collect FCs. For social media post
to compare models’ performance on selected test samples.
collection, we used a paid subscription of X to gather the 8𝐾 tweets
• Input: Choose pipelines form a list of pipelines; from 402 MEP. We focused on social media posts in English and
• Output: Single plot of overall performance, plot of candidate French FCs articles for the aligned pair collection. We collected
identifier performance, 5 worst performance instances. tweets mentioned in 4.7𝐾 English FCs, 1.2𝐾 French FCs and 3.2𝐾
Design. This option is for users who do not intend to develop a FC in other languages. This resulted in 9.1𝐾 aligned pairs of social
pipeline but need to use one. The user can supply an FC corpus, or media posts and relevant FC articles, as many FC has two tweets
a default one (the one we prepare for the demo, Section 4) can be related to it. For some of our FC retrieval experiments described
used. The user specifies the claim language (or we can auto-detect next, we will consider these pairs to be the ground truth, allowing
it), and claim type (post, image, or text). the search of a FC by given its paired tweet. For others, the text of
the claim described on the ClaimReview schema of the FC is used
• Input: Specify query type (post, image, text), and dataset as the its ground-truth pair.
language;
• Output: A recommended pipeline based on (𝑖) the most Pipeline. We preload FactCheckBureau with our proposed
frequently used components for these inputs, or comparable FC retrieval pipeline that supports the default query interface
inputs (same language, same query type) if there is no history for non-technical users. In our FC retrieval setting, the collec-
of running on the same inputs; (𝑖𝑖) simple rules to chose the tion of FC articles serves as the document corpus and tweets
necessary pre-processing models based on the input type. serve as the input query. We experiment with around 41𝐾
articles in English and French. Our pipeline starts with pre-
Deploy. FactCheckBureau can be used as a search interface (bot- processing a tweet by removing links, emojis, escape control and
tom in Figure 2) for finding the relevant FCs for an input query, or, special characters, standarizing Unicode presentation with more
alternatively, for a specific topic, specified as a short phrase, e.g., than one representation, normalizing numbers and dates using
"Covid". The user chooses a previously specified retrieval pipeline num2words7 and dateparser8 libraries, and tokenizing text us-
already present in the system, and configure the number of relevant ing Spacy 9 tokenizer. We employed the well-established BM25
documents she wants to retrieve. Then, FactCheckBureau returns
a list of the FCs relevant for the claim, respectively, FCs about the 5 https://toolbox.google.com/factcheck/apis
6 https://tinyurl.com/25t28phf
given topic.
7 https://pypi.org/project/num2words/
• Input: Choose a pipeline, or select "auto-design"; specify 8 https://pypi.org/project/dateparser/
References https://doi.org/10.24963/ijcai.2021/619
[1] Fatma Arslan, Naeemul Hassan, Chengkai Li, and Mark Tremayne. 2020. A [11] Preslav Nakov, Giovanni Da San Martino, Firoj Alam, Shaden Shaar, Hamdy
Benchmark Dataset of Check-worthy Factual Claims. In 14th International AAAI Mubarak, and Nikolay Babulkov. 2022. Overview of the CLEF-2022 CheckThat!
Conference on Web and Social Media. AAAI. lab task 2 on detecting previously fact-checked claims. (2022).
[2] Alberto Barrón-Cedeño, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, [12] Dan S Nielsen and Ryan McConville. 2022. Mumin: A large-scale multilingual
Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov, Bayan multimodal fact-checked misinformation social network dataset. In SIGIR.
Hamdan, Alex Nikolov, Shaden Shaar, and Zien Sheikh Ali. 2020. Overview of [13] Rrubaa Panchendrarajan and Arkaitz Zubiaga. 2024. Claim detection for
CheckThat! 2020: Automatic Identification and Verification of Claims in Social automated fact-checking: A survey on monolingual, multilingual and cross-
Media. In CLEF. Springer. https://doi.org/10.1007/978-3-030-58219-7_17 lingual research. Natural Language Processing Journal 7 (2024), 100066. https:
[3] Sylvie Cazalens, Philippe Lamarre, Julien Leblay, Ioana Manolescu, and Xavier //doi.org/10.1016/j.nlp.2024.100066
Tannier. 2018. A Content Management Perspective on Fact-Checking. In The [14] Youri Peskine, Raphaël Troncy, and Paolo Papotti. 2024. CimpleKG: a Continu-
Web Conference. ACM. https://doi.org/10.1145/3184558.3188727 ously Updated Knowledge Graph of Fact-Checks and Related Misinformation. In
[4] Tanmoy Chakraborty, Valerio La Gatta, Vincenzo Moscato, and Giancarlo Sperl‘i. Infox sur Seine workshop.
2023. Information retrieval algorithms and neural ranking models to detect [15] Matús Pikuliak, Ivan Srba, Róbert Móro, Timo Hromadka, Timotej Smolen, Martin
previously fact-checked information. Neurocomputing 557 (2023), 126680. https: Melisek, Ivan Vykopal, Jakub Simko, Juraj Podrouzek, and Mária Bieliková. 2023.
//doi.org/10.1016/j.neucom.2023.126680 Multilingual Previously Fact-Checked Claim Retrieval. In EMNLP. https://doi.
[5] FullFact. 2020. The challenges of online fact checking. https://fullfact.org/media/ org/10.18653/V1/2023.EMNLP-MAIN.1027
uploads/coof-2020.pdf. [16] Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Frame-
[6] Susmita Gangopadhyay, Katarina Boland, Danilo Dessì, Stefan Dietze, Pavlos work: BM25 and Beyond. Foundations and Trends in Information Retrieval 3, 4
Fafalios, Andon Tchechmedjiev, Konstantin Todorov, and Hajira Jabeen. 2023. (2009). https://doi.org/10.1561/1500000019
Truth or Dare: Investigating Claims Truthfulness with ClaimsKG. In Linked Data- [17] Shaden Shaar, Nikolay Babulkov, Giovanni Da San Martino, and Preslav Nakov.
driven Resilience Research, Vol. 3401. https://ceur-ws.org/Vol-3401/paper7.pdf 2020. That is a Known Lie: Detecting Previously Fact-Checked Claims. In ACL.
[7] Ashkan Kazemi, Kiran Garimella, Devin Gaffney, and Scott Hale. 2021. Claim https://doi.org/10.18653/V1/2020.ACL-MAIN.332
Matching Beyond English to Scale Global Fact-Checking. In IJCNLP. Association [18] Iknoor Singh, Carolina Scarton, Xingyi Song, and Kalina Bontcheva. 2023. Finding
for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.347 Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual,
[8] Ashkan Kazemi, Zehua Li, Verónica Pérez-Rosas, Scott A Hale, and Rada Mihalcea. Cross-Dataset and Zero-Shot Learning. arXiv:2308.05680 (2023).
2022. Matching tweets with applicable fact-checks across languages. In De-Factify: [19] Megha Sundriyal, Tanmoy Chakraborty, and Preslav Nakov. 2023. From Chaos
Workshop on Multimodal Fact Checking and Hate Speech Detection (with AAAI). to Clarity: Claim Normalization to Empower Fact-Checking. In EMNLP Findings.
[9] Emma Lurie and Eni Mustafaraj. 2020. Highly Partisan and Blatantly Wrong: https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP.439
Analyzing News Publishers’ Critiques of Google’s Reviewed Claims. In Truth and [20] Nguyen Vo and Kyumin Lee. 2020. Where Are the Facts? Searching for Fact-
Trust Online Conference. Hacks Hackers. https://truthandtrustonline.com/wp- checked Information to Alleviate the Spread of Fake News. In EMNLP. https:
content/uploads/2020/10/TTO07.pdf //doi.org/10.18653/V1/2020.EMNLP-MAIN.621
[10] Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Al- [21] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and
berto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. false news online. Science 359 (2018). https://doi.org/10.1126/science.aap9559
2021. Automated Fact-Checking for Assisting Human Fact-Checkers. In IJCAI. arXiv:https://www.science.org/doi/pdf/10.1126/science.aap9559