Rethinking All Evidence: Enhancing Trustworthy Retrieval-Augmented Generation via Conflict-Driven Summarization

Rethinking All Evidence: Enhancing Trustworthy Retrieval-Augmented Generation via Conflict-Driven Summarization

ACL ARR 2025 May Submission2392 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating their parameter knowledge with external retrieved content. However, knowledge conflicts caused by internal inconsistencies or noisy retrieved content can severely impact the generation reliability of RAG systems. In this work, we argue that LLMs should rethink all evidence, including both retrieved content and internal knowledge, before generating responses. We propose **CARE-RAG** (**C**onflict **A**ware and **R**eliable **E**vidence for RAG), a novel framework that improves trustworthiness through Conflict-Driven Summarization of all available evidence. CARE-RAG first derives parameter-aware evidence by comparing parameter records to identify diverse internal perspectives. It then refines retrieved evidences to produce context-aware evidence, removing irrelevant or misleading content. To detect and summarize conflicts, we distill a 3B LLaMA3.2 model to perform conflict-driven summarization, enabling reliable synthesis across multiple sources. To further ensure evaluation integrity, we introduce a QA Repair step to correct outdated or ambiguous benchmark answers. Experiments on revised QA datasets with retrieval data show that CARE-RAG consistently outperforms strong RAG baselines, especially in scenarios with noisy or conflicting evidence.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: commonsense QA, logical reasoning, knowledge base QA, interpretability, reasoning, open-domain QA

Languages Studied: English

Keywords: Large Language Models, Retrieval-Augmented Generation

Submission Number: 2392

Loading