Calibration-Free Defense Against Backdoor Attacks in the Wild

Calibration-Free Defense Against Backdoor Attacks in the Wild

ICLR 2026 Conference Submission16307 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Backdoor Defense, Backdoor Attack

TL;DR: We propose CMP, where CMP is the first backdoor defense method to be: 1. Truly Data-Free without threshold. 2. The only algorithm which works in Practical Scale.

Abstract: The widespread adoption of pre-trained neural networks from unverified sources has heightened concerns about backdoor attacks. These attacks cause networks to misbehave on inputs containing specific triggers while maintaining normal performance otherwise. Existing methods typically rely on pruning, operating under the assumption that backdoors are encoded in a small set of specific neurons. This approach, however, is ineffective on large-scale models where phenomena like polysemanticity make isolating malicious neurons without harming model performance difficult. Furthermore, pruning-based methods are impractical as they require unavailable calibration data to determine critical thresholds, limiting their deployment in real-world scenarios. We introduce Calibration-free Model Purification (CMP), a novel, completely data-free defense that avoids pruning entirely. CMP leverages a self-distillation framework guided by our discovery of a systematic "prediction skew" as the fundamental mechanism for backdoor transfer during knowledge distillation. It employs a dual-filtering system that counteracts this skew, preventing the student model from inheriting the teacher's malicious behavior. On the challenging ImageNet dataset, CMP reduces attack success rates to near-zero across diverse attacks while preserving clean accuracy, outperforming existing methods. Our work presents the first scalable, threshold-free defense, offering a practical solution for real-world AI security.

Supplementary Material: pdf

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 16307

Loading