Keywords: Adversarial detection, Geometric Distance, Multimodal models
Abstract: Vision-language pre-trained models (VLPs) have been deployed in numerous real-world applications; however, these models are vulnerable to adversarial attacks. Existing adversarial detection methods have shown their efficacy in single-modality settings (either vision or language), while their performance on VLPs, as multimodal models, remains uncertain. In this work, we propose a novel aspect of adversarial detection called GAD-VLP, which detects adversarial examples by exploiting vision and joint vision-language embeddings within VLP architectures. We leverage the geometry of the embedding space and demonstrate the unique characteristics of adversarial regions within these models. We explore the embedding space of the vision modality or the combined vision-language modalities, depending on the type of VLP, to identify adversarial examples. Some of the geometric methods do not require explicit knowledge of the adversary's targets in downstream tasks (e.g., zero-shot classification or image-text retrieval), offering a model-agnostic detection framework applicable across VLPs. Despite its simplicity, we demonstrate that these methods deliver a nearly perfect detection rate on state-of-the-art adversarial attacks against VLPs, including both separate and combined attacks on the vision and joint modalities.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10075
Loading