Clean-label Backdoor Attacks by Selectively Poisoning with Limited Information from Target Class

Published: 28 Oct 2023, Last Modified: 13 Mar 2024NeurIPS 2023 BUGS PosterEveryoneRevisionsBibTeX
Keywords: backdoor attacks, sample selection
TL;DR: This paper improves clean-label backdoor attacks by selectively poisoning with the target class data access only.
Abstract: Deep neural networks have been shown to be vulnerable to backdoor attacks, in which the adversary manipulates the training dataset to mislead the model when the trigger appears, while it still behaves normally on benign data. Clean label attacks can succeed without modifying the semantic label of poisoned data, which are more stealthy but, on the other hand, are more challenging. To control the victim model, existing works focus on adding triggers to a random subset of the dataset, neglecting the fact that samples contribute unequally to the success of the attack and, therefore do not exploit the full potential of the backdoor. Some recent studies propose different strategies to select samples by recording the forgetting events or looking for hard samples with a supervised trained model. However, these methods require training and assume that the attacker has access to the whole labeled training set, which is not always the case in practice. In this work, we consider a more practical setting where the attacker only provides a subset of the dataset with the target label and has no knowledge of the victim model, and propose a method to select samples to poison more effectively. Our method takes advantage of pretrained self-supervised models, therefore incurs no extra computational cost for training, and can be applied to any victim model. Experiments on benchmark datasets illustrate the effectiveness of our strategy in improving clean-label backdoor attacks. Our strategy helps SIG reach 91\% success rate with only 10\% poisoning ratio.
Submission Number: 35