LLM-based Multi-Agents System Attack via Continuous Optimization with Discrete Efficient Search

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-agent system, adversarial attack, LLM-based jailbreak
TL;DR: Attack an LLM-based multi agent system with only one intevention, we propose a token-based optimization method
Abstract: Large Language Model (LLM)-based Multi-Agent Systems (MAS) have demonstrated remarkable capability in complex tasks. However, emerging evidence indicates significant security vulnerabilities within these systems. In this paper, we introduce three novel and practical attack scenarios that allow only a single intervention on one agent from the MAS. However, previous methods struggle to achieve success. Thus, we propose Continuous Optimization with Discrete Efficient Search (CODES), a token-level jailbreak method that combines continuous-space optimization with discrete-space search to efficiently generate self-replicating attack prompts. Through CODES, malicious content propagates across multiple agents, compromising the entire MAS. In the three realistic threat scenarios—ranging from triggering offensive outputs across an entire agent cohort to bypassing multi-level safeguard modules, CODES demonstrate effectiveness. Our findings underscore the urgent need for more robust safety mechanisms tailored to MAS and highlight the importance of developing resilient alignment strategies to defend against this new class of adversarial threats.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 204
Loading