Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

YUTONG WU; Jie Zhang; Yiming Li; Chao Zhang; Qing Guo; Han Qiu; Nils Lukas; Tianwei Zhang

Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

YUTONG WU, Jie Zhang, Yiming Li, Chao Zhang, Qing Guo, Han Qiu, Nils Lukas, Tianwei Zhang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We build an mechanism to granted the VLM-based multi-agent system with the specific immunity against the infectious jailbreaking attack

Abstract: Vision Language Model (VLM) Agents are stateful, autonomous entities capable of perceiving and interacting with their environments through vision and language. Multi-agent systems comprise specialized agents who collaborate to solve a (complex) task. A core security property is **robustness**, stating that the system maintains its integrity during adversarial attacks. Multi-agent systems lack robustness, as a successful exploit against one agent can spread and **infect** other agents to undermine the entire system's integrity. We propose a defense Cowpox to provably enhance the robustness of a multi-agent system by a distributed mechanism that improves the **recovery rate** of agents by limiting the expected number of infections to other agents. The core idea is to generate and distribute a special *cure sample* that immunizes an agent against the attack before exposure. We demonstrate the effectiveness of Cowpox empirically and provide theoretical robustness guarantees.

Lay Summary: In today's world, artificial intelligence systems are becoming more advanced and capable of interacting with the world through both images and language. Some of these AI systems, called Vision-Language Model (VLM) Agents, can even work together in teams—each agent doing a specific job—to tackle complex problems. However, like humans, these AI teams can face serious risks. If one agent is tricked or attacked, the problem can quickly spread to others, just like a virus, threatening the entire team’s ability to function. This makes security, especially something called robustness—the system’s ability to stay safe and reliable under attack—extremely important. To tackle this, the researchers propose a new exploratory defense method called Cowpox. Inspired by the idea of vaccines, Cowpox helps protect these AI agents before they’re attacked. It does this by sending out special "cure samples" that can train agents to recognize and resist the threats early on. Think of it like giving the team a heads-up and a shield before danger strikes.

Primary Area: Social Aspects->Security

Keywords: VLM, Adversarial Examples, Infectious Attack

Submission Number: 12309

Loading