CoFact: Conformal Factuality Guarantees for Language Models under Distribution Shift

CoFact: Conformal Factuality Guarantees for Language Models under Distribution Shift

ICLR 2026 Conference Submission17164 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Conformal prediction, Hallucination, LLM reliability

Abstract: Large Language Models (LLMs) excel in natural language processing (NLP) tasks but often generate false or misleading information, known as hallucinations, raising reliability concerns in high-stakes applications. To provide statistical guarantees on the factuality of LLM outputs, conformal prediction based techniques have been proposed. Despite their strong theoretical guarantees, they rely heavily on the exchangeability assumption between calibration and test data, which is frequently violated in real-world scenarios with dynamic distribution shifts. To overcome this limitation, we introduce \textbf{CoFact}, a conformal prediction framework that uses online density ratio estimation to adaptively reweigh calibration data, ensuring alignment with evolving test distributions. With this approach, CoFact bypasses the exchangeability requirement and provides robust factuality guarantees under non-stationary conditions. To theoretically justify CoFact, we establish an upper bound on the gap between the actual hallucination rate and the target level $\alpha$, demonstrating that the bound asymptotically approaches zero as the number of rounds and calibration samples increase. Empirically, CoFact is evaluated on MedLFQA, WikiData, and the newly introduced \textbf{WildChat+} dataset, which captures real-world distribution shifts through user-generated prompts. Results demonstrate that CoFact consistently outperforms existing methods, maintaining reliability even under dynamic conditions.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 17164

Loading