Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models

Kai Hu; Weichen Yu; Alexander Robey; Li Zhang; Andy Zou; Haoqi Hu; Chengming Xu; Matt Fredrikson

Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models

Kai Hu, Weichen Yu, Alexander Robey, Li Zhang, Andy Zou, Haoqi Hu, Chengming Xu, Matt Fredrikson

Published: 01 Jul 2025, Last Modified: 01 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adversarial attack, vision language model

TL;DR: We propose a transferable adversarial attack that achieves high attack success rate on GPT-4o, Claude and Gemini

Abstract: The rapid advancement of Multimodal Large Language Models (MLLMs) has greatly enhanced various applications but simultaneously raised significant security concerns, particularly related to visual adversarial attacks. Current adversarial robustness evaluations are limited to simple tasks like object classification and short caption. Therefore, we introduce new evaluation settings: in addition to the image captioning setting, open-ended Visual Question Answering (VQA) and text spotting are also introduced to challenge existing attack methods. We propose a systematic transfer-based adversarial pipeline, improving the attack transferability for proprietary black-box MLLMs from model, loss function and data level. Empirical results demonstrate strong transferability, achieving up to 84.8% and 47.1% success rates on GPT-4o and Claude3.5 for image captioning ($\epsilon=8/255$), and 31% and 24% for text recognition ($\epsilon=16/255$),. Our work demonstrates that transfer-based attacks on image modalities are feasible and highly successful even on proprietary MLLMs.

Submission Number: 63

Loading