Abstract: The rapid proliferation of online information has made it increasingly challenging to differentiate factual content from misinformation. Traditional fact-checking methods, which require extensive manual effort, are not scalable given the volume of misinformation spreading online. Automated fact-checking has emerged as a promising solution, leveraging machine learning models trained on datasets derived from fact-checking websites (Wang, 2017; Augenstein et al., 2019; Gupta and Srikumar, 2021). However, many of these datasets include post-analysis commentary from annotators, which may introduce bias and provide implicit cues that aid model performance. To address this limitation, we introduce Politi-Fact-Only, a benchmark dataset comprising 1,482 instances curated from PolitiFact.com, where we remove post-analysis and retain only factual evidence Fig 1. This ensures that models must rely solely on factual reasoning rather than verdict-related information. Our experiments demonstrate that state-of-the-art fact-checking models, including large language models (LLMs), struggle to accurately classify claims when deprived of post-claim analysis, highlighting their reliance on implicit cues rather than pure factual reasoning.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: fact checking, rumor/misinformation detection
Languages Studied: English
Submission Number: 7826
Loading