Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models
Abstract: Pre-trained Vision-Language Models (VLMs) have shown great ability in various Vision-Language tasks. However, these VLMs exhibit inherent vulnerabilities to transferable adversarial examples, which could potentially undermine their performance and reliability in real-world applications. Cross-modal interactions have been demonstrated to be the key point to boosting adversarial transferability, but the utilization of them is limited in existing multimodal transferable adversarial attacks. Stable Diffusion, which contains multiple cross-attention modules, possesses great potential in facilitating adversarial transferability by leveraging abundant cross-modal interactions. Therefore, We propose a Multimodal Diffusion-based Attack (MDA), which conducts adversarial attacks against VLMs using Stable Diffusion. Specifically, MDA initially generates adversarial text, which is subsequently utilized as guidance to optimize the adversarial image during the diffusion process. Besides leveraging adversarial text in calculating downstream loss to obtain gradients for optimizing image, MDA also takes it as the guiding prompt in adversarial image generation during the denoising process, which enriches the ways of cross-modal interactions, thus strengthening the adversarial transferability. Compared with pixel-based attacks, MDA introduces perturbations in the latent space rather than pixel space to manipulate high-level semantics, which is also beneficial to improving adversarial transferability. Experimental results demonstrate that the adversarial examples generated by MDA are highly transferable across different VLMs on different downstream tasks, surpassing state-of-the-art methods by a large margin.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Experience] Multimedia Applications, [Content] Media Interpretation
Relevance To Conference: In order to enhance the transferability of multimodal attacks, this work proposes Multimodal Diffusion-based Attack (MDA), which is an unrestricted attack using diffusion models for Vision-Language Pre-training models. Specifically, MDA initially generates adversarial text, which are subsequently utilized as guidance to optimize adversarial image during the diffusion process. Besides leveraging adversarial text in calculating downstream loss to obtain gradients for optimizing image, MDA also takes it as the guiding prompt in adversarial image generation during the denoising process, which enriches the ways of cross-modal interactions, thus strengthening the adversarial transferability. According to the experimental results, adversarial examples generated by MDA achieve state-of-the-art transferability across different VLP models on different downstream tasks. This work contributes to multimodal processing by exploring the potential of diffusion models used to conduct multimodal unrestricted attack on VLP models, introducing MDA and demonstrating its high transferability. Given the harmness of multimodal adversarial examples for trustworthy AI, we urge the community to focus more on the compliant use of multimodal generative models.
Supplementary Material: zip
Submission Number: 4547
Loading