Abstract: Backdoor attacks have become a security threat to deep neural networks (DNNs), in which an attacker embeds a secret behavior into a DNN by poisoning a few training data. To address the backdoor threat, some defense strategies employ outlier detection algorithms to identify poisoned samples in hidden representation space. However, these defenses remain vulnerable to adaptive attacks as their representation separability assumption could be broken. In this paper, we aim to boost existing defenses by leveraging insights from the label smoothing technique, demonstrating its effectiveness in distinguishing poison from benign samples. Our analysis uncovers the role of label smoothing as a regularization technique that enhances hidden class separability in the penultimate layer of a model. Building on the label smoothing, we introduce Learning Speed-driven Label Smoothing (LS2): a simple yet novel approach that assigns an adaptive smoothing rate based on the model’s “learning speed” for each sample. Extensive results show that LS2 can bolster the discernibility between poison and benign samples, enhancing the efficacy of defenses relying on hidden separability. Incorporated with LS2, existing hidden-separation-based defenses achieve state-of-the-art poison sample removal rates (Prm) against adaptive attacks. Code is available at https://github.com/JiePeng104/LS2
External IDs:dblp:journals/tifs/PengYHZHDZ25
Loading