Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices
TL;DR: We robustify VLMs by aligning entire adversarial simplices rather than individual adversarial samples with classifier scores of clean samples.
Abstract: Vision-Language Models (VLMs) such as CLIP excel at zero-shot classification due to large-scale pre-training but are vulnerable to adversarial examples. Adversarial fine-tuning robustifies zero-shot models by aligning prediction scores of individual adversaries with their clean counterparts, which typically overlooks intermediate adversarial samples along the adversarial trajectory crossing the decision boundary. Such intermediate adversaries and their vicinity produce informative representations capturing the decision boundary in detail. They can be improved by sampling adversarial candidates from simplices formed by joining two consecutive vertices on the adversarial trajectory and their clean counterpart. However, sampling simplices for adversaries is very costly. To train robust VLM, we overcome these limitations by Taylor expansion and formulating an upper-bound of alignment loss that depends on the Jacobian/Hessian obtained at clean samples. As regions between clean and intermediate adversarial samples capture a larger decision landscape, we robustify VLM by plausible adversaries from simplices by our closed-form formulation equivalent to infinite uniform sampling of the simplex. We obtain state-of-the-art robustness across 15 datasets and diverse vision-language tasks.
Lay Summary: Adversarial examples pose a security threat to AI systems by perturbing their decisions. To counteract this type of attack, we generate adversarial samples and encapsulate them into an adversarial simplex, which is used to robustify VLM against adversarial attacks.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Deep Learning->Robustness
Keywords: Vision-Language Models, Adversarial Examples, Zero-Shot Classification
Submission Number: 2781
Loading