UniBP: Toward Universal Backdoor Purification via Fine-Tuning

UniBP: Toward Universal Backdoor Purification via Fine-Tuning

ICLR 2026 Conference Submission22565 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: backdoor attack, backdoor defense, adversarial ML

TL;DR: UniBP is a fine-tuning–based, data-efficient, attack-agnostic defense that uses ~1% of data to cut backdoor ASR from >90% to <5% across diverse attacks and benchmarks.

Abstract: Deep neural networks (DNNs) remain vulnerable to backdoor attacks, perpetuating an arms race between attacks and defenses. Despite their efficacy against classical threats, mainstream defenses often fail under more advanced, defense-aware attacks, particularly clean-label variants that can evade decision-boundary shifting and neuron-pruning defenses. We present UniBP, a universal post-training defense that operates with only 1\% of the original training data and unveils the relationship between batch normalization (BN) behavior and backdoor effects. At a high level, UniBP scrutinizes BN layers’ affine parameters and statistics using a small clean subset (i.e., as small as 1\% of the training data) to find the most impactful affine parameters for reactivating the backdoor, then prunes them and applies masked fine-tuning to remove the backdoor effects. We compare our method against 5 SOTA defenses, 5 backdoor attacks, and various attack/defense conditions, and show that UNBP consistently reduces the attack success rate from more than 90\% to less than 5\% while preserving clean performance, whereas other baselines degrade under smaller fine-tuning sets or stronger poisoning techniques.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 22565

Loading