SSFO: Self-Supervised Faithfulness Optimization for Retrieval-Augmented Generation

03 Sept 2025 (modified: 02 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval-Augmented Generation, Faithfulness hallucination, Likelihood Displacement
Abstract: Retrieval-Augmented Generation (RAG) systems require Large Language Models (LLMs) to generate responses that are faithful to the retrieved context. However, faithfulness hallucination remains a critical challenge, as existing methods often require costly supervision and post-training, or imposing significant inference burdens. To overcome these limitations, we introduce Self-Supervised Faithfulness Optimization (SSFO), a self-supervised alignment approach for enhancing faithfulness. SSFO constructs preference data pairs by contrasting the model's outputs generated with context versus without context. Leveraging Direct Preference Optimization (DPO), SSFO aligns model faithfulness without incurring labeling costs or additional inference burdens. We analyze this faithfulness alignment process and provide empirical evidence that it leverages a benign form of likelihood displacement, shifting probability mass from parametric-based tokens to context-aligned tokens. Based on this insight, we adapt the DPO loss using a weighting scheme that encourages likelihood displacement. Comprehensive evaluations show that SSFO significantly outperforms existing methods, achieving state-of-the-art results in faithfulness on multiple context-based question-answering datasets. Notably, SSFO exhibits strong generalization, improving cross-lingual faithfulness while preserving general instruction-following capabilities. We release our code at: https://anonymous.4open.science/r/SSFO
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 1318
Loading