FedSMU: Communication-Efficient and Generalization-Enhanced Federated Learning through Symbolic Model Updates

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The significant communication overhead and client data heterogeneity have posed an important challenge to current federated learning (FL) paradigm. Existing compression-based and optimization-based FL algorithms typically focus on addressing either the model compression challenge or the data heterogeneity issue individually, rather than tackling both of them. In this paper, we observe that by symbolizing the client model updates to be uploaded (i.e., normalizing the magnitude for each model parameter at local clients), the model heterogeneity, essentially stemmed from data heterogeneity, can be mitigated, and thereby helping improve the overall generalization performance of the globally aggregated model at the server. Inspired with this observation, and further motivated by the success of Lion optimizer in achieving the optimal performance on most tasks in the centralized learning, we propose a new FL algorithm, called FedSMU, which simultaneously reduces the communication overhead and alleviates the data heterogeneity issue. Specifically, FedSMU splits the standard Lion optimizer into the local updates and global execution, where only the symbol of client model updates commutes between the client and server. We theoretically prove the convergence of FedSMU for the general non-convex settings. Through extensive experimental evaluations on several benchmark datasets, we demonstrate that our FedSMU algorithm not only reduces the communication overhead, but also achieves a better generalization performance than the other compression-based and optimization-based baselines.
Lay Summary: Federated learning is a distributed learning paradigm to enable multiple devices to collaboratively train an AI model but without sharing the raw data. Under this framework, two bottlenecks become serious: the large amount of model updates would incur very high communication overhead, while each device’s unique data can pull the overall AI model to update in a diversity of different directions. Our proposed algorithm, FedSMU, can tackle both of these two challenges at the same time. It allows that each device sends only the sign (i.e., “+” or “–”) of every weight change in the AI model, thus reducing the communication overhead to a few bits while naturally dampening the conflicts of update directions originated from the data heterogeneity across different devices. We further pair this “symbol-only” trick with a split version of the Lion optimizer, which is run partly on devices and partly on the server. We prove that FedSMU still converges for the non-convex settings of current AI models, and on image and text benchmarks it cuts the bandwidth up to ten-fold while outperforming the leading compression-based or optimization-based federated learning methods. This paves the way for a faster, fairer, and more private AI in the bandwidth-constrained settings.
Link To Code: https://github.com/lxy66888/fedsmu.git
Primary Area: General Machine Learning->Everything Else
Keywords: Federated learning, efficient communication, enhanced generalization
Submission Number: 10177
Loading