TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

AAAI 2026 Workshop TrustAgent Submission45 Authors

Published: 20 Nov 2025, Last Modified: 09 Mar 2026AAAI 2026 TrustAgent Workshop OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation; Large Language Model; Adversarial Attacks

TL;DR: The paper introduces a two-stage method aimed at improving the trustworthiness of RAG systems.

Abstract: Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user queries. These systems, however, remain susceptible to various RAG attacks, which can severely impair the performance of LLMs. To address this challenge, we propose TrustRAG, a robust framework that systematically filters malicious and irrelevant content and relies on knowledge resolution to dynamically balance internal and external knowledge, ensuring trustworthy and reliable generation. Our approach employs a two-stage defense mechanism. The first stage implements a cluster filtering strategy to efficiently remove surfaced attack patterns that are similar. The second stage in addition employs a self-assessment process that utilizes the reasoning capabilities of LLMs to resolve both intra-documents fact conflicts and internal-external knowledge inconsistencies, dynamically choosing the source where the final answer is grounded. TrustRAG provides a plug-and-play, training-free module that integrates seamlessly with any open- or closed-source language model. Extensive experiments show that TrustRAG significantly improves efficiency and robustness against diverse RAG attacks, including real-world situations.

Submission Number: 45

Loading