Robust Fact-Checking under Contaminated Evidence Sources via Claim Decomposition and Dynamic Reweighting
Keywords: Fact Checking, Misinformation Detection, Large Language Model, Contaminated Knowledge Base
TL;DR: We perform claim verification on contaminated knowledge base and detect misinformation in the knowledge base via a document weight updating method.
Abstract: Fact-checking seeks to assess the veracity of claims with respect to a knowledge base from which supporting or refuting evidence can be retrieved. However, most existing approaches assume access to a clean and reliable knowledge source. In practice, retrieved evidence is often contaminated with misinformation, which substantially reduces verification accuracy. In this paper, we address the task of fact checking under contaminated knowledge bases and propose a framework designed to remain robust in noisy environments. Our approach first decomposes each claim into subclaims, then retrieves candidate evidence for each subclaim. A large language model (LLM) is subsequently employed to classify documents into supporting, refuting, or unrelated categories, and subclaim veracity is determined through a carefully weighted majority stance. To further enhance robustness, documents are dynamically reweighted: supporting evidence is upweighted as likely truthful, while refuting evidence is downweighted as potentially misleading, and these weights are incorporated into subsequent retrieval through reranking. To rigorously evaluate this setting, we introduce a method for constructing adversarially contaminated knowledge bases by generating misinformation derived from gold evidence and false claims, which effectively misleads standard retrievers. Experimental results across open-source LLMs and datasets demonstrate that contamination severely degrades baseline fact checking performance, while our framework substantially mitigates this effect.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 3506
Loading