On the Adversarial Robustness of Visual-Language Chat Models

Published: 01 Jan 2025, Last Modified: 19 Sept 2025ICMR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the rapid development of large language models (LLMs), there has been a strong interest in integrating other modalities such as image comprehension capabilities. While they have shown impressive performance in various multimodal tasks, the robustness of Visual Language Models (VLMs) has not been thoroughly investigated. We mainly focus on the robustness of VLMs on visual adversarial examples. In this work, we explore the capability of adversarial examples targeting VLMs. We highlight that the multimodal nature of VLMs presents a unique attack surface to manipulate the outputs of the LLMs, and the continuous nature of visual inputs further enhances the effectiveness of adversarial attacks against language generative models. Furthermore, we demonstrate three application scenarios for adversarial examples targeting VLMs: image description, jailbreaking, and information hiding. We conduct experiments on several leading open-source VLMs and demonstrate the successful application of adversarial examples in all the proposed scenarios. We hope that our findings would enable the development of multimodal models more robust to adversarial attacks. Our code is available at https://github.com/lafeat/m3-break.
Loading