Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows
Keywords: Multi-Agent Systems, LLM Security, Agent Safety, Adversarial Agents, Scaling Laws, Prompt Injection
TL;DR: As LLMs scale in linear multi-agent workflows, they exhibit "compliance-correction symmetry" where they become both better saboteurs when compromised and better repairers when trusted, making a final correction stage essential for system resilience.
Abstract: As LLM-based multi-agent systems (MAS) are deployed in the wild, from autonomous coding assistants to enterprise automation, the resilience of their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model scaling and system-level resilience remains poorly understood. This paper investigates how model scale affects the security of linear multi-agent workflows. Our experiments across scales of two open-source model families on the HumanEval benchmark reveal a compliance-correction symmetry: larger models are much more likely to faithfully execute malicious instructions, drastically increasing failure rates in pipelines without downstream correction. However, appending a lightweight terminal correction stage restores statistical parity with control-level performance, demonstrating that strictly linear collaboration structures can be viable and resilient to adversaries at this scale.
Track: Short Paper (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 251
Loading