Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models

Published: 01 Jul 2025, Last Modified: 01 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: adversarial attack, vision language model
TL;DR: We propose a transferable adversarial attack that achieves high attack success rate on GPT-4o, Claude and Gemini
Abstract: The rapid advancement of Multimodal Large Language Models (MLLMs) has greatly enhanced various applications but simultaneously raised significant security concerns, particularly related to visual adversarial attacks. Current adversarial robustness evaluations are limited to simple tasks like object classification and short caption. Therefore, we introduce new evaluation settings: in addition to the image captioning setting, open-ended Visual Question Answering (VQA) and text spotting are also introduced to challenge existing attack methods. We propose a systematic transfer-based adversarial pipeline, improving the attack transferability for proprietary black-box MLLMs from model, loss function and data level. Empirical results demonstrate strong transferability, achieving up to 84.8% and 47.1% success rates on GPT-4o and Claude3.5 for image captioning ($\epsilon=8/255$), and 31% and 24% for text recognition ($\epsilon=16/255$),. Our work demonstrates that transfer-based attacks on image modalities are feasible and highly successful even on proprietary MLLMs.
Submission Number: 63
Loading