Augment Semantics, Transfer Better: Unveiling Adversarial Transferability in Multimodal Large Language Models
Abstract: Recently, Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in cross-modality interaction, yet they exhibit adversarial vulnerabilities. The transferability of adversarial examples, which enables cross-model adversarial attacks and poses a more severe effect, remains an ongoing challenge. In this paper, we provide a comprehensive analysis of the transferability of adversarial examples generated by MLLMs. To explore the potential transferable impact in the real world, we utilize two tasks that can have both negative and positive societal impacts: \ding{182} Harmful Word Insertion and \ding{183} Information Protection. Furthermore, we identify \underline{\textit{two key Factors}} that significantly impact adversarial transferability, and discover that semantic-level data augmentation methods can effectively boost the adversarial transferability. We also propose two novel semantic-level data augmentation methods, Adding Image Patch (AIP) and Typography Augment Transferability Method (TATM), that can greatly boost the transferability of adversarial examples across MLLMs.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodal Large Language Models; Adversarial Transferability; Data Augmentation
Languages Studied: English
Submission Number: 985
Loading