FedGraph: Defending Federated Large Language Model Fine-Tuning Against Backdoor Attacks via Graph-Based Aggregation

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Federated learning, Language Large Model, Backdoor defense
Abstract: Federated fine-tuning of large language models (LLMs) enables collaborative training without sharing raw data, offering a promising solution to data scarcity and privacy concerns. However, this setting is highly vulnerable to backdoor attacks, where adversaries inject malicious updates that preserve normal performance on benign inputs but induce targeted responses when triggered. We first demonstrate that backdoor attacks remain effective in the federated LoRA fine-tuning scenario, exposing a critical security risk. We further show that existing federated learning defenses are inadequate, as the high dimensionality and entanglement of LLM updates undermine anomaly detection methods. To overcome these challenges, we introduce \textit{FedGraph}, a graph-based aggregation framework. FedGraph represents client updates as nodes in a dynamic graph, extracts topological features including Degree, Betweenness, and Closeness centrality, and uses these to construct low-dimensional fingerprints of client behavior. An unsupervised clustering process then separates malicious from benign participants. Extensive experiments confirm that FedGraph achieve state-of-the-art defense against LLM backdoor attacks, reducing the attack success rate to below 10\%, while delivering high detection accuracy (95.5\% on average) and low false positives (2.33\%), significantly outperforming existing defenses.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 9582
Loading