Epistemic Vigilance in Multi-Agent Systems: Defending Against Semantic Contagion via Recursive Pragmatic Analysis
Keywords: Multi-Agent Systems, Semantic Contagion, Theory of Mind, Epistemic Vigilance, LLM Safety, Adversarial Attacks, AgentHazard
Abstract: Multi-agent systems (MAS) face a critical vulnerability: Semantic Contagion, where a compromised agent manipulates group dynamics to induce unsafe behaviors without triggering individual safety filters. We identify Machiavellian Injection adversaries exploiting Theory of Mind (ToM) to construct gradual persuasion chains and propose the Epistemic Vigilance Protocol (EVP), a decentralized defense inspired by human cognitive immunology. EVP equips agents with a Pragmatic Intent Auditor (PIA) analyzing implicit goals, Recursive Trust Dynamics (RTD) for adaptive isolation, and Counterfactual Consensus (CC) for group-level deliberation. On our AgentHazard benchmark (800 scenarios, 8 domains), EVP reduces attack success rates by 87% (from 60% to 8%) while retaining 92% utility at only 1.5x overhead.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: Adversarial Attacks, Multi-Agent Systems, Safety, Trustworthiness, Ethics
Contribution Types: Data resources, Theory
Languages Studied: English
Submission Number: 4854
Loading