TRUTH: Teaching LLMs to Rerank for Truth in Misinformation Detection

Published: 25 Jul 2025, Last Modified: 12 Oct 2025COLM 2025 Workshop SoLaR PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reranker, Misinformation Detection
TL;DR: LLM as reranker with SFT and DPO for improving misinformation detection
Abstract: Misinformation detection presents a significant challenge due to its knowledge-intensive and reasoning-intensive nature. While Retrieval-Augmented Generation (RAG) systems offer a promising direction, the effectiveness of their retrieval and reranking components is crucial. This paper introduces TRUTH, a novel reranking approach designed for domain adaptation, specifically for misinformation detection, which employs a two-stage training methodology: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). We demonstrate that our 1B parameter TRUTH model achieves strong performance comparable to 7B models on established misinformation benchmarks such as FEVER and Canadian bilingual news datasets, improving retrieval quality and positively impacting downstream task accuracy. Our findings highlight the efficacy of combining SFT for broad knowledge acquisition and domain adaptation with DPO for nuanced reasoning alignment in developing efficient and effective rerankers for complex, knowledge-intensive tasks. Datasets and code will be available with the camera-ready version of the paper.
Submission Number: 31
Loading