Adversarial Unlearning of Poisoned Features for Backdoor Defense in Federated Learning

Jinjie Xiao; Qiang Zhou; Wenya Wang; Sinno Jialin Pan

Adversarial Unlearning of Poisoned Features for Backdoor Defense in Federated Learning

Jinjie Xiao, Qiang Zhou, Wenya Wang, Sinno Jialin Pan

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated learning, backdoor defense

Abstract: Federated learning (FL) enables collaborative model training without exposing local data but is highly vulnerable to backdoor attacks, particularly under non-IID client distributions and persistent malicious participants. Existing defenses often rely on robust aggregation or auxiliary data, yet their effectiveness diminishes under challenging conditions such as low poisoning ratios and heterogeneous data, and they remain susceptible to adaptive or stealthy adversaries. We propose \emph{adversarial unlearning of poisoned features} (AUPF), an in-training defense that generates adversarial perturbations on benign clients to expose vulnerable decision boundaries and explicitly regularizes the feature representations of clean and perturbed samples. This feature-level alignment suppresses poisoned associations and ensures that robustness acquired locally propagates to the global model despite dynamic updates and client heterogeneity. We design a bi-level optimization framework that integrates seamlessly with FL training and show that it achieves computational efficiency comparable to lightweight baselines while avoiding the scalability issues of prior defenses. Extensive experiments across diverse datasets, attack strategies, and non-IID scenarios demonstrate that AUPF consistently achieves lower attack success rates while maintaining high clean accuracy, establishing it as an effective and scalable defense for backdoor-resilient federated learning.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 8827

Loading