Augment Semantics, Transfer Better: Unveiling Adversarial Transferability in Multimodal Large Language Models

Augment Semantics, Transfer Better: Unveiling Adversarial Transferability in Multimodal Large Language Models

ACL ARR 2024 December Submission985 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in cross-modality interaction, yet they exhibit adversarial vulnerabilities. The transferability of adversarial examples, which enables cross-model adversarial attacks and poses a more severe effect, remains an ongoing challenge. In this paper, we provide a comprehensive analysis of the transferability of adversarial examples generated by MLLMs. To explore the potential transferable impact in the real world, we utilize two tasks that can have both negative and positive societal impacts: \ding{182} Harmful Word Insertion and \ding{183} Information Protection. Furthermore, we identify \underline{\textit{two key Factors}} that significantly impact adversarial transferability, and discover that semantic-level data augmentation methods can effectively boost the adversarial transferability. We also propose two novel semantic-level data augmentation methods, Adding Image Patch (AIP) and Typography Augment Transferability Method (TATM), that can greatly boost the transferability of adversarial examples across MLLMs.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodal Large Language Models; Adversarial Transferability; Data Augmentation

Languages Studied: English

Submission Number: 985

Loading