AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks even under a black-box setting where the adversary can only query the model. Particularly, query-based black-box adversarial attacks estimate adversarial gradients based on the returned probability vectors of the target model for a sequence of queries. During this process, the queries made to the target model are intermediate adversarial examples crafted at the previous attack step, which share high similarities in the pixel space. Motivated by this observation, stateful detection methods have been proposed to detect and reject query-based attacks. While demonstrating promising results, these methods either have been evaded by more advanced attacks or suffer from low efficiency in terms of the number of shots (queries) required to detect different attacks. Arguably, the key challenge here is to assign high similarity scores for any two intermediate adversarial examples perturbed from the same image. To address this challenge, we propose a novel Adversarial Contrastive Prompt Tuning (ACPT) method to robustly fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. With ACPT, we further introduce a detection framework AdvDet that can detect 7 state-of-the-art query-based attacks with >99% detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: In the past decade, deep neural networks (DNNs) have achieved remarkable achievements across a wide range of fields, including computer vision, natural language processing, and multimodal learning. Despite these advancements, studies have shown that DNNs are extremely vulnerable to small adversarial perturbations at the inference stage. This has raised serious security concerns on the development of DNNs in safety-critical scenarios, such as autonomous driving and medial diagnosis. To address this, we propose a novel Adversarial Contrastive Prompt Tuning (ACPT) framework that can train robust feature extractors for stateful detection, aimed at detecting query-based black-box adversarial attacks. Our work contributes to improving the safety of multimodal systems.
Supplementary Material: zip
Submission Number: 2280
Loading