AdvFLYP: Adversarially Finetune Like You Pretrain for Zero-shot Robustness of CLIP

Songlong Xing; Weijie Wang; Zhengyu Zhao; Nicu Sebe

AdvFLYP: Adversarially Finetune Like You Pretrain for Zero-shot Robustness of CLIP

Songlong Xing, Weijie Wang, Zhengyu Zhao, Nicu Sebe

14 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: CLIP, adversarial robustness, adversarial training, robust finetuning

Abstract: Pretrained vision-language models (VLMs) like CLIP are shown to be highly susceptible to adversarial perturbations. Adversarial finetuning (AFT) approaches have been proposed to improve the zero-shot adversarial robustness of CLIP on various downstream tasks, based on finetuning the vision encoder on adversarial images generated from a proxy classification dataset, such as TinyImageNet. However, we demonstrate that existing AFT approaches have largely overlooked the important role of the training recipe, particularly the training data and objective. To this end, we propose Adversarially Finetune Like You Pretrain (AdvFLYP), which practically retains the training recipe of CLIP's pretraining during AFT. We finetune CLIP based on adversarial images generated from web-scale image-text data with a contrastive loss. Experiments validate the superiority of AdvFLYP on various downstream datasets. For example, AdvFLYP outperforms existing AFT approaches finetuned on TinyImageNet (ImageNet) by 19.1% (3.1%), averaged on 14 downstream datasets. Further analyses show that sufficiently large training data amounts and batch sizes are crucial for the contrastive learning of AdvFLYP. Our code and model checkpoints will be released.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 5221

Loading