Keywords: backdoor attack, backdoor defense, adversarial ML
TL;DR: UniBP is a fine-tuning–based, data-efficient, attack-agnostic defense that uses ~1% of data to cut backdoor ASR from >90% to <5% across diverse attacks and benchmarks.
Abstract: Deep neural networks (DNNs) remain vulnerable to backdoor attacks, perpetuating an arms race between attacks and defenses. Despite their efficacy against classical threats, mainstream defenses often fail under more advanced, defense-aware attacks, particularly clean-label variants that can evade decision-boundary shifting and neuron-pruning defenses. We present UniBP, a universal post-training defense that operates with only 1\% of the original training data and unveils the relationship between batch normalization (BN) behavior and backdoor effects.
At a high level, UniBP scrutinizes BN layers’ affine parameters and statistics using a small clean subset (i.e., as small as 1\% of the training data) to find the most impactful affine parameters for reactivating the backdoor, then prunes them and applies masked fine-tuning to remove the backdoor effects. We compare our method against 5 SOTA defenses, 5 backdoor attacks, and various attack/defense conditions, and show that UNBP consistently reduces the attack success rate from more than 90\% to less than 5\% while preserving clean performance, whereas other baselines degrade under smaller fine-tuning sets or stronger poisoning techniques.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 22565
Loading