Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models

Zhewen Yao; Yao Zhu; Shiliang Zhang

Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models

Zhewen Yao, Yao Zhu, Shiliang Zhang

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial Attacks, Robustness

Abstract: Existing adversarial attacks on Large Vision-Language Models (LVLMs) often struggle with limited transferability to black-box models or produce perceptible artifacts that are easily detected. This paper presents Progressive Semantic Infusion (PSI), a diffusion-based attack that progressively aligns and infuses natural target semantics. To improve transferability, PSI leverages diffusion priors to better align adversarial examples with the natural image distribution and employs progressive alignment to mitigate overfitting on a single fixed surrogate objective. To enhance stealthiness, PSI embeds source-aware cues during denoising to preserve visual fidelity and avoid detectable artifacts. Experiments show that PSI effectively attacks open-source, adversarially trained, and commercial VLMs, including GPT-5 and Grok-4, surpassing existing methods in both transferability and stealthiness. Our findings highlight a critical vulnerability in modern vision-language systems and offer valuable insights towards building more robust and trustworthy multimodal models.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 8234

Loading