Rethinking the Necessity of Labels in Backdoor RemovalDownload PDF

Published: 04 Mar 2023, Last Modified: 27 Apr 2023ICLR 2023 BANDS SpotlightReaders: Everyone
Abstract: Since training a model from scratch always requires massive computational resources recently, it has become popular to download pre-trained backbones from third-party platforms and deploy them in various downstream tasks. While providing some convenience, it also introduces potential security risks like backdoor attacks, which lead to target misclassification for any input image with a specifically defined trigger (\textit{i.e.}, backdoored examples). Current backdoor removal methods always rely on clean labeled data, which indicates that safely deploying the pre-trained model in downstream tasks still demands these costly or hard-to-obtain labels. In this paper, we focus on how to purify a backdoored backbone with only unlabeled data. To evoke the backdoor patterns without labels, we propose to leverage the unsupervised contrastive loss to search for backdoors in the feature space. Surprisingly, we find that we can mimic backdoored examples with adversarial examples crafted by contrastive loss, and erase them with adversarial finetuning. Thus, we name our method as \textit{Contrastive Backdoor Defense} (CBD). Against several backdoored backbones from both supervised and self-supervised learning, extensive experiments demonstrate that our proposed CBD, without using labels, achieves comparable or even better defense performance compared to the ones using labels, which allows practitioners to safely deploy pre-trained backbones on downstream tasks without extra labeling costs.
0 Replies

Loading