Keywords: RAG Poison Attack, Trustwrothy AI, Retrieval System Poison Attack
TL;DR: We show that RAG poisoning attacks behave very differently when multiple attackers compete, and introduce PoisonArena to evaluate this.
Abstract: Retrieval-Augmented Generation (RAG) systems improve the factual grounding of large language models (LLMs) but remain vulnerable to retrieval poisoning, where adversaries seed the corpus with manipulated content. Prior work largely evaluates this threat under a simplified single-attacker assumption. In practice, however, high-value or high-visibility queries attract multiple adversaries with conflicting objectives. Motivated by real cases, we introduce the setting of competing attacks, in which multiple attackers simultaneously attempt to steer the same (or closely related) query toward different targets. We formalize this threat model and propose competitive effectiveness, a metric that quantifies an attacker’s advantage under competition. Extensive experiments show that many strategies that succeed in the single-attacker regime degrade markedly under competition, revealing performance inversions and highlighting the limits of conventional metrics such as attack success rate and F1. Further more, we present PoisonArena, a standardized framework and benchmark for evaluating poisoning attacks and defenses under realistic, multi-adversary conditions. Our code is included in the supplementary materials.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15817
Loading