Improving the fact-checking performance of language models by relying on their entailment ability

ACL ARR 2025 May Submission3042 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Automated fact-checking is a crucial task in this digital age. To verify a claim, current approaches majorly follow one of two strategies i.e. (i) relying on embedded knowledge of language models, and (ii) fine-tuning them with evidence pieces. While the former can make systems to hallucinate, the later have not been very successful till date. The primary reason behind this is that fact verification is a complex process. Language models have to parse through multiple pieces of evidence before making a prediction. Further, the evidence pieces often contradict each other. This makes the reasoning process even more complex. We proposed a simple yet effective approach where we relied on entailment and generative ability of language models to produce “supporting” and “refuting” justifications (for a claim to be true). We trained language models based on these justifications and achieved superior results. Apart from that, we did a systematic comparison of different prompting and fine-tuning strategies as it is lacking in the literature. Some of our findings are: (i) training language models with raw evidence sentences registered an improvement up to 8.20% in macro-F1, over the best performing baseline for the RAW-FC dataset, (ii) similarly, training language models with prompted claim-evidence understanding TBE-2 registered an improvement (with a margin up to 16.39% over the baselines for the same dataset, (iii) training language models with entailed justifications TBE-3 outperformed the baselines by a huge margin (up to 28.57% and 44.26% for LIAR-RAW and RAW-FC, respectively). We have shared our code repository to reproduce our results.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Fact checking, Entailment, Prompting, Fine-tuning
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
Keywords: Fact-checking, Fine-tuning, Prompting
Submission Number: 3042
Loading