CANDY: Benchmarking LLMs' Deficiency in Fact-checking Real-world Chinese Misinformation

CANDY: Benchmarking LLMs' Deficiency in Fact-checking Real-world Chinese Misinformation

ACL ARR 2025 February Submission8067 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the iterative upgrades of LLMs, their potential for assisting real-world fact-checking has attracted growing interest. However, their effectiveness in detecting misinformation and providing reliable fact-checking explanations has not been thoroughly explored. To address this gap, we propose a comprehensive framework to evaluate and improve LLMs in real-world fact-checking: First, we introduce \texttt{CANDY}, a benchmark with a structured taxonomy specifically designed to evaluate LLMs' performance in misinformation scenarios. Second, we present \texttt{CANDYSET}, a new dataset that enables a detailed evaluation of LLMs' strengths, weaknesses, and risks in fact-checking tasks. Third, leveraging \texttt{CANDY}, we conduct an in-depth analysis to uncover task-specific limitations of LLMs. Our finding indicate that the inherent deficiencies of current LLMs indeed hinder real-world fact-checking practices but also highlight the potential for enhancing task performance through internal optimization. Our work provides a solid foundation for future research. Data samples can be accessed at \url{https://anonymous.4open.science/status/CANDY-7D2E}.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking fact-checking

Contribution Types: Data resources

Languages Studied: English Chinese

Submission Number: 8067

Loading