Non-targeted Adversarial Attacks on Vision-Language Models via Maximizing Information Entropy

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Adversarial Attacks, Vision-Language Models, Trustworthy AI
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Adversarial examples pose significant security concerns in deep neural networks and play a crucial role in assessing the robustness of models. Nevertheless, existing research has primarily focused on classification tasks, while the evaluation of adversarial examples is urgently needed for more complex tasks. In this paper, we investigate the adversarial robustness of large vision-language models (VLMs). We propose a non-targeted white-box attack method that maximizes information entropy (MIE) to induce the victim model to generate misleading image descriptions deviating from reality. Our method is thoroughly analyzed experimentally, with validation conducted on the ImageNet dataset. The comprehensive and quantifiable experimental results demonstrate a significant success rate achieved by our method in adversarial attacks. Given the consistent architecture of the language decoder, our proposed method can serve as a benchmark for evaluating the robustness of diverse vision-language models.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2947
Loading