The Attack Means Nothing: Test-time Adversarial Defense Improves Zero-shot Adversarial Robustness for Medical Vision-Language Models

17 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: test-time adversarial defense, medical vision-language model, classification, deep learning
Abstract: Vision-language models (VLMs), exemplified by CLIP, have achieved remarkable zero-shot generalization but remain highly vulnerable to imperceptible adversarial perturbations, posing significant safety threats, particularly in medical scenarios. In this paper, we first prove that VLMs are much more robust than adversarial attacks when faced with weak transformations. Building upon this insight, we propose the The Attack Means Nothing (TAME), a simple yet effective test-time defense paradigm for improving the zero-shot adversarial robustness of medical VLMs. We conduct comprehensive experiments on 11 medical datasets across 9 imaging modalities against three representative white-box attacks (PGD, C&W, and AutoAttack). The BiomedCLIP with a backbone of ViT-B/16 is utilized as the victim model. Extensive experiment results demonstrate that our TAME consistently outperforms other defense methods across all attack types, boosting the vanilla BiomedCLIP by +47.47% under PGD, +46.73% under C&W, and +47.79% under AutoAttack, while maintaining competitive clean accuracy. These significant improvements also suggest a potential risk of label leakage during attacks. Furthermore, our TAME is plug-and-play and can be integrated with other adversarially fine-tuned VLMs to further enhance their defense capabilities. These findings support a practical and generalizable approach to deploying medical VLMs in clinical scenarios with the presence of adversaries. Codes will be available on GitHub.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9110
Loading