IOShift: Backdoor Defense via Model Bias Shift in Federated Learning

ICLR 2026 Conference Submission18712 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Federated learning, Backdoor defense
Abstract: As a privacy-preserving and decentralized machine learning framework, Federated Learning (FL) is vulnerable to backdoor attacks. Current backdoor defenses rely on a strong assumption: defenders have the ability of defining a benign parameter space using gradient information to detect or remove malicious updates. However, in the real-world not-independent-and-identically-distributed (Non-IID) FL scenarios, this is a particularly challenging task, exhibiting inconsistent performance across different systems and settings. In this paper, we reveal the Backdoor-Induced Model Bias Shift phenomenon, where the implantation of backdoor shortcuts shifts the model bias on out-of-distribution (OOD) data toward the target class. Inspired by this insight, we propose IOShift, a novel backdoor detection and removal method based on model bias shift in federated learning. IOShift detects malicious updates by measuring bias shifts on OOD data, using the model bias on in-distribution data as a reference. Furthermore, it employs adaptive weight pruning to maintain high utility on clean tasks. IOShift seamlessly integrates into existing FL frameworks without requiring any modifications, such as altering communication protocols or injecting elaborated tasks. Experimental results on benchmark datasets and backdoor attacks demonstrate that IOShift effectively outperforms state-of-the-art backdoor defenses. Code is available here.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Supplementary Material: pdf
Submission Number: 18712
Loading