Worldwide Federated Training of Language Models

Alex Iacob; Lorenzo Sani; Bill Marino; Preslav Aleksandrov; William F. Shen; Nicholas Donald Lane

Worldwide Federated Training of Language Models

Alex Iacob, Lorenzo Sani, Bill Marino, Preslav Aleksandrov, William F. Shen, Nicholas Donald Lane

25 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated Learning, Distributed Training, Language Modeling, Natural Language Processing, Hierarchical Federated Learning, Governance

TL;DR: WorldLM is an LM training system based on federations of federations, enabling actors with varying regulatory, privacy, and security concerns to collaborate. It accounts for data heterogeneity via attention-based aggregation and residual embeddings.

Abstract: Language Model (LM) training requires vast datasets, raising legal, ethical, and practical concerns. Federated learning (FL) offers an alternative by enabling organizations to collaboratively leverage untapped reserves while minimizing data movement. However, scaling FL globally introduces challenges such as restrictions on data movement, privacy, and statistical data heterogeneity. We propose Worldwide Federated Language Model Training (WorldLM), a system that builds federations of federations. WorldLM enables each federation to autonomously meet jurisdictional or competitive constraints while managing statistical heterogeneity through attention-based aggregation of key layers and cross-federation information sharing via residual embeddings. In terms of perplexity, WorldLM outperforms standard FL and other federated baselines by up to $1.91\times$ and $3.3\times$ respectively. WorldLM scales to models with $400$M parameters, achieving $1.39\times$ lower perplexity than centralized counterparts while approaching the performance of perfectly localized models trained in an infinite-data regime. Additionally, under differential privacy constraints, WorldLM proves highly resilient in performance compared to standard FL methods, which diverge. These results establish WorldLM as an effective means for pre-training across geographic and legal boundaries.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5275

Loading