On Provable Benefits of Muon in Federated Learning: Improved Communication Complexity and Beyond

20 Sept 2025 (modified: 06 Oct 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Federated learning
Abstract: The recently introduced optimizer, Muon, has gained increasing attention due to its superior performance across a wide range of applications. However, its effectiveness in federated learning remains unexplored. To address this gap, this paper investigates the performance of Muon in the federated learning setting. Specifically, we propose a new algorithm, FedMuon, and establish its convergence rate for nonconvex problems. Our theoretical analysis reveals multiple favorable properties of FedMuon. In particular, due to its orthonormalized update direction, FedMuon achieves significantly improved communication complexity compared to existing momentum-based federated learning methods. Furthermore, it does not rely on any heterogeneity assumptions or specialized operations to guarantee convergence, its learning rate is independent of problem-specific parameters, and, importantly, it can naturally accommodate heavy-tailed noise. Finally, extensive experiments on a variety of neural network architectures validate the effectiveness of the proposed algorithm.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 22978
Loading