LoDAdaC: a unified local training-based decentralized framework with adaptive gradients and compressed communication

Published: 15 Apr 2026, Last Modified: 15 Apr 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In the decentralized distributed learning, achieving fast convergence and low communication cost is essential for scalability and high efficiency. Adaptive gradient methods, such as Adam, have demonstrated strong practical performance in deep learning and centralized distributed settings. However, their convergence properties remain largely unexplored in decentralized settings involving multiple local training steps, such as federated learning. To address this limitation, we propose LoDAdaC, a unified multiple \textbf{Lo}cal Training (MLT) \textbf{D}ecentralized framework with \textbf{Ada}m-type updates and \textbf{C}ompressed communication (CC). LoDAdaC accommodates a broad class of optimizers for its local adaptive updates, including AMSGrad, Adam, and AdaGrad; it is compatible with standard (possibly biased) compressors such as low-bit quantization and sparsification. MLT and CC enable LoDAdaC to achieve multiplied reduction of communication cost, while the technique of adaptive updates enables fast convergence. We rigorously prove the combined advantage through complexity analysis. In addition, experiments on image classification and GPT-style language model training validate our theoretical findings and show that LoDAdaC significantly outperforms existing decentralized algorithms in terms of convergence speed and communication efficiency.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. "Adam-updates" are changed to "adaptive gradients" in the title. 2. More discussions on the difference from the SQuARM-SGD are made. 3. One new baseline algorithm is added in the experiments for comparison. 4. Experiments are extended to using more computing agents and to more communication topology. 5. Experiments on heterogeneous data are added. 6. Consensus curves are added in the figures. 7. Discussions on our strong assumptions of bounded gradient are made.
Code: https://github.com/DecentralizedMethods/LoDAdaC
Assigned Action Editor: ~Franck_Iutzeler1
Submission Number: 6265
Loading