Adversarial-Guided Diffusion for Robust and High-Fidelity Multimodal LLM Attacks

17 Sept 2024 (modified: 12 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial Attack, multimodal large language models
Abstract: Recent diffusion-based adversarial attack methods have shown promising results in generating natural adversarial images. However, these methods often lack fidelity by inducing significant distortion on the original image with even small perturbations on the latent representation. In this paper, we propose Adversarial-Guided Diffusion (AGD), a novel diffusion-based generative adversarial attack framework, which introduces adversarial noise during the reverse sampling of conditional diffusion models. AGD uses editing-friendly inversion sampling to faithfully reconstruct images without significantly distorting them through gradients on the latent representation. In addition, AGD enhances latent representations by intelligently choosing sampling steps, thereby injecting adversarial semantics more smoothly. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in both the effectiveness of generating adversarial images for targeted attacks on multimodal large language models (MLLMs) and image quality, successfully misleading the MLLM's responses. We argue that the security concerns surrounding the adversarial robustness of MLLMs deserve increased attention from the research community.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1281
Loading