TUAP: Targeted Universal Adversarial Perturbations for CLIP

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial, Universal, Perturbation, Vision Language Model, VLM, CLIP
TL;DR: This paper reveals a universal vulnerability in CLIP models and large vision-language models to targeted universal adversarial perturbations in a black-box setting.
Abstract: As Contrastive Language-Image Pretraining (CLIP) models are increasingly adopted in a wide range of downstream tasks and large Vision-Language Models (VLMs), their vulnerability to adversarial attacks has attracted growing attention. In this work, we examine the susceptibility of CLIP models to Universal Adversarial Perturbations (UAPs). Unlike existing works that focus on untargeted attacks in a white-box setting, we investigate targeted UAPs (TUAPs) in a black-box setting, with a particular emphasis on transferability. In TUAP, the adversary can specify a targeted adversarial text description and generate a universal $L_{\infty}$-norm-bounded or $L_2$-norm perturbation or a small unrestricted patch, using an ensemble of surrogate CLIP encoders. When TUAP is applied to different test images, it can mislead the image encoder of unseen CLIP models into producing image embeddings that are consistently close to the adversarial target text embedding. We conduct comprehensive experiments to demonstrate the effectiveness and transferability of TUAPs. This universal transferability extends not only across different datasets and models but also to downstream models, such as large VLMs including OpenFlamingo, LLaVA, MiniGPT-4 and BLIP2. TUAP can mislead them into generating responses that contain text descriptions specified by the adversaries. Our findings reveal a universal vulnerability in CLIP models to targeted adversarial attacks, emphasizing the need for effective countermeasures.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5168
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview