Adversarially Robust CLIP Models Induce Better (Robust) Perceptual Metrics

Published: 03 Jul 2024, Last Modified: 17 Jul 2024ICML 2024 FM-Wild Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: perceptual metrics, CLIP, adversarial robustness
TL;DR: We show that adversarially trained CLIP models induce perceptual similarity metrics with SOTA (zero-shot) clean performance and robustness.
Abstract: Measuring perceptual similarity is a key tool in computer vision. In recent years perceptual metrics based on features extracted from neural networks with large and diverse training sets, e.g. CLIP, have become popular. At the same time, the metrics extracted from features of neural networks are not adversarially robust. In this paper we show that adversarially robust CLIP models induce *better* and *adversarially robust* perceptual metrics that outperform existing metrics in a zero-shot setting, and further match the performance of state-of-the-art metrics while being robust after fine-tuning. Notably, these perceptual metrics enable adversarially robust NSFW content detection. Finally, the perceptual metrics induced by robust CLIP models have higher interpretability: feature inversion can show which images are considered similar, while text inversion can find what images are associated to a given prompt. This also allows us to visualize the very rich visual concepts learned by a CLIP model, including memorized persons, paintings and complex queries.
Submission Number: 73
Loading