Towards a Self-Made Model: Zero-Shot Self-Supervised Purification for Adversarial Attacks

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: adversarial attacks, adversarial defense, adversarial purification
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Zero-Shot Learning for Adversarial Purification
Abstract: Adversarial purification is an adversarial defense method without robustness training for the classifier and regardless of the form of attacks, aiming to remove the adversarial perturbations on the attacked images. Such methods can defend against various unseen threats without modifying the classifier in contrast to empirical defenses. However, previous purification methods require careful training of a strong generative model or incorporating additional knowledge when training a classifier to be comparable to adversarial training. Retraining promising generative models or classifiers on large-scale datasets (e.g., ImageNet) is extremely challenging and computation-consuming. In this work, following the natural image manifold hypothesis, we propose a zero-shot self-supervised method for adversarial purification named \textit{ZeroPur}: For an adversarial example that lies beyond the natural image manifold, its corrupted embedding vector is first restored so that it is moved close to the natural image manifold. The embedding is then fine-tuned on finer intermediate-level discrepancies to project it back within the manifold. The whole purification process is done from coarse to fine, which does not rely on any generative model and does not require retraining the classifier to incorporate additional knowledge. Extensive experiments on three datasets including CIFAR-10, CIFAR-100, and ImageNet with various classifier architectures including ResNet and WideResNet, demonstrate that our method achieves state-of-the-art robust performance. Code released.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6991
Loading