Generalizable poisoning-resistant backdoor detection and removal framework: From dataset perspective

Published: 24 Nov 2025, Last Modified: 27 Jan 2026Pattern RecognitionEveryoneRevisionsCC BY 4.0
Abstract: Backdoor attacks pose a severe threat to the dataset integrity within the Deep Learning (DL) paradigm. Despite various defenses proposed in recent work, these approaches often have limited generalizability and effectiveness. Therefore, we introduce a generalizable framework: GBDR, to detect and remove backdoors from untrustworthy datasets without requiring any knowledge about the attack specifications or modifying default trained models. Firstly, we find a phenomenon termed the Model Capacity Effect (MCE), where backdoor and clean samples exhibit distinct performances across models with varying capacities. Motivated by MCE, a detection model with lower capacity is elaborately customized to differentiate between backdoor and clean samples, and the theoretical analysis is also provided. Then, we design a purification method to remove triggers from backdoor samples and restore their true labels, based on the diffusion and discriminator process, respectively. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our GBDR framework, surpassing state-of-the-art defenses. This advancement not only enhances the robustness of models against backdoor attacks but also contributes to a broader understanding of dataset safety and integrity in model training. Codes are available in https://github.com/brother2cat/GBDR
Loading