Uncovering Competing Poisoning Attacks in Retrieval-Augmented Generation

Uncovering Competing Poisoning Attacks in Retrieval-Augmented Generation

ICLR 2026 Conference Submission15817 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RAG Poison Attack, Trustwrothy AI, Retrieval System Poison Attack

TL;DR: We show that RAG poisoning attacks behave very differently when multiple attackers compete, and introduce PoisonArena to evaluate this.

Abstract: Retrieval-Augmented Generation (RAG) systems improve the factual grounding of large language models (LLMs) but remain vulnerable to retrieval poisoning, where adversaries seed the corpus with manipulated content. Prior work largely evaluates this threat under a simplified single-attacker assumption. In practice, however, high-value or high-visibility queries attract multiple adversaries with conflicting objectives. Motivated by real cases, we introduce the setting of competing attacks, in which multiple attackers simultaneously attempt to steer the same (or closely related) query toward different targets. We formalize this threat model and propose competitive effectiveness, a metric that quantifies an attacker’s advantage under competition. Extensive experiments show that many strategies that succeed in the single-attacker regime degrade markedly under competition, revealing performance inversions and highlighting the limits of conventional metrics such as attack success rate and F1. Further more, we present PoisonArena, a standardized framework and benchmark for evaluating poisoning attacks and defenses under realistic, multi-adversary conditions. Our code is included in the supplementary materials.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 15817

Loading