CLEAR: Contamination-Aware Evaluation of Retrieval in Agentic Fact-Checking

CLEAR: Contamination-Aware Evaluation of Retrieval in Agentic Fact-Checking

ACL ARR 2026 January Submission1344 Authors

29 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Contamination, Fact-Checking, Information Retrieval, Benchmark

Abstract: Evaluating information retrieval in agentic systems is increasingly difficult due to model contamination and tight coupling between retrieval and intervened agent reasoning. Large language models may recall fact checking knowledge from pretraining, while agents shape queries in ways that confound retrieval evaluation, causing standard end to end evaluations to yield conclusions that do not generalize across agentic architectures or datasets. We introduce a contamination aware evaluation framework for retrieval in agentic fact checking that fixes the language model and corpus and evaluates retrieval across diverse agentic-retriever interaction settings, enabling controlled analysis of how contamination and query generation affect retrieval quality independently of downstream reasoning. Our experiments show that contamination impacts retrieval behavior, retriever rankings are unstable across agentic systems due to query and retrieval interaction effects, and that different choices of how NDCG values are aggregated can lead to qualitatively different and even reversed comparisons between agents. For datasets with silver documents, we propose nDEv2R, a rank sensitive fact level retrieval metric that remains informative under incomplete evidence supervision. While instantiated in fact checking, our findings apply more broadly to evaluating retrieval components embedded in agentic systems such as question answering and multi document reasoning.

Paper Type: Long

Research Area: Information Extraction and Retrieval

Research Area Keywords: Contamination,Fact-Checking, Information Retrieval, Benchmark

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 1344

Loading