FedEAT: A Robustness Optimization Framework for Federated LLMs

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: federated learning; large language models; adversarial training; robustness
TL;DR: We enhance the adversarial robustness of federated LLMs through embedding-space adversarial training and loss function regularization.
Abstract: The integration of federated learning (FL) with large language models (LLMs) leverages the privacy-preserving benefits of decentralized data processing in sensitive domains such as healthcare, finance, and law, while also addressing the growing scarcity of high-quality training data for LLMs. However, in practical deployments, federated large language models (federated LLMs) are highly vulnerable to adversarial attacks, which can severely undermine their reliability and stability. To overcome these challenges, we introduce fedEAT (Federated Embedding-Space Adversarial Training), a novel algorithm that performs adversarial training directly in the client LLM’s embedding space and incorporates a regularization term to balance robustness against clean-data accuracy. Extensive experiments demonstrate that, compared to conventional federated LLMs, fedEAT greatly enhances classification accuracy on adversarial examples, while causing only negligible performance degradation on clean inputs, and remains scalable to tasks in other domains. These results validate fedEAT’s effectiveness and practical value in enhancing the robustness of federated LLMs across critical, privacy-sensitive applications.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 9271
Loading