Bi-perspective Splitting Defense: Achieving Clean-Data-Free Backdoor Security

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Trustworthy AI, Backdoor Defense, Deep Neural Networks
Abstract: Backdoor attacks have seriously threatened deep neural networks (DNNs) by embedding concealed vulnerabilities through data poisoning. To counteract these attacks, training benign models from poisoned data garnered considerable interest from researchers. High-performing defenses often rely on additional clean subsets, which is untenable due to increasing privacy concerns and data scarcity. In the absence of clean subsets, defenders resort to complex feature extraction and analysis, resulting in excessive overhead and compromised performance. In the face of these challenges, we identify the key lies in sufficient utilization of the easier-to-obtain target labels and excavation of clean hard samples. In this work, we propose a Bi-perspective Splitting Defense (BSD). BSD splits the dataset using both semantic and loss statistics characteristics through open set recognition-based splitting (OSS) and altruistic model-based data splitting (ALS) respectively, achieving good clean pool initialization. BSD further introduces class completion and selective dropping strategies in the subsequent pool updates to avoid potential class underfitting and backdoor overfitting caused by loss-guided split. Through extensive experiments on 3 benchmark datasets and against 7 representative attacks, we empirically demonstrate that our BSD is robust across various attack settings. Specifically, BSD has an average improvement in Defense Effectiveness Rating (DER) by 16.29\% compared to 5 state-of-the-art defenses, achieving clean-data-free backdoor security with minimal compromise in both Clean Accuracy (CA) and Attack Success Rate (ASR).
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9597
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview