Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

Timothy McAllister; Sina Abdidizaji; Ivan Garibay; Ozlem Garibay

Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

Timothy McAllister, Sina Abdidizaji, Ivan Garibay, Ozlem Garibay

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Systems, LLM Security, Agent Safety, Adversarial Agents, Scaling Laws, Prompt Injection

TL;DR: As LLMs scale in linear multi-agent workflows, they exhibit "compliance-correction symmetry" where they become both better saboteurs when compromised and better repairers when trusted, making a final correction stage essential for system resilience.

Abstract: As LLM-based multi-agent systems (MAS) are deployed in the wild, from autonomous coding assistants to enterprise automation, the resilience of their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model scaling and system-level resilience remains poorly understood. This paper investigates how model scale affects the security of linear multi-agent workflows. Our experiments across scales of two open-source model families on the HumanEval benchmark reveal a compliance-correction symmetry: larger models are much more likely to faithfully execute malicious instructions, drastically increasing failure rates in pipelines without downstream correction. However, appending a lightweight terminal correction stage restores statistical parity with control-level performance, demonstrating that strictly linear collaboration structures can be viable and resilient to adversaries at this scale.

Track: Short Paper (4 pages)

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 251

Loading