Stealthy Backdoor Attack via Confidence-driven Sampling

Published: 27 Nov 2024, Last Modified: 27 Nov 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Backdoor attacks facilitate unauthorized control in the testing stage by carefully injecting harmful triggers during the training phase of deep neural networks. Previous works have focused on improving the stealthiness of the trigger while randomly selecting samples to attack. However, we find that random selection harms the stealthiness of the model. In this paper, we identify significant pitfalls of random sampling, which make the attacks more detectable and easier to defend against. To improve the stealthiness of existing attacks, we introduce a method of strategically poisoning samples near the model's decision boundary, aiming to minimally alter the model's behavior (decision boundary) before and after backdooring. Our main insight for detecting boundary samples is exploiting the confidence scores as a metric for being near the decision boundary and selecting those to poison (inject) the attack. The proposed approach makes it significantly harder for defenders to identify the attacks. Our method is versatile and independent of any specific trigger design. We provide theoretical insights and conduct extensive experiments to demonstrate the effectiveness of the proposed method.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Stanislaw_Kamil_Jastrzebski1
Submission Number: 3086
Loading