Abstract: Patch-based data poisoning backdoor attacks have been exposed the vulnerability of deep neural networks (DNNs). While the differentially private training algorithm is a promising defense method, it faces significant challenges: 1) simultaneously limiting the fitting of clean and poisoned samples induces the degradation of clean accuracy, and 2) maintaining model stability struggles when poisoned samples dominate the target class. To address these challenges, we propose the Bi-optimization Training Strategy, which integrates robust training with poisoned sample filtering techniques and conducts asynchronous optimization to complete the defense. Next, to implement this strategy, we combine the Differentially Private training algorithm with the Confusion training method to unveil a practical defense framework (DPC). This approach focuses on filtering out poisoned samples and retraining the model with the rest. To take full advantage of inherent stability of the differentially private training algorithm, even the poisoned samples dominate the target class, we adopt self-supervised pre-training to treat poisoned samples as outliers in the latent space. Then, the supervised fine-tuning algorithm enhanced with differential privacy can effectively limit the fitting of these poisoned samples. Additionally, we adaptively adjust the strength of differential privacy protection based on insights from filtered samples, improving clean sample fitting and further strengthening poison samples detection. Finally, our extensive experiments demonstrate that DPC (Our code is publicly available at https://github.com/yyk1997/DPC) preserves clean accuracy effectively while providing robust backdoor protection.
Loading