Fast Convergent Federated Learning via Decaying SGD Updates

Md Palash Uddin, Yong Xiang, Mahmudul Hasan, Yao Zhao, Youyang Qu, Longxiang Gao

Published: 01 Feb 2026, Last Modified: 21 Jan 2026IEEE Transactions on Big DataEveryoneRevisionsCC BY-SA 4.0
Abstract: Federated Learning (FL), a groundbreaking approach for collaborative model training across decentralized devices, maintains data privacy while constructing a decent global machine learning model. Conventional FL methods typically demand more communication rounds to achieve convergence in non-Independent and non-Identically Distributed (non-IID) data scenarios due to their reliance on fixed Stochastic Gradient Descent (SGD) updates at each Communication Round (CR). In this paper, we introduce a novel strategy to expedite the convergence of FL models, inspired by the insights from McMahan et al.’s seminal work. We focus on FL convergence via traditional SGD decay by introducing a dynamic adjusting mechanism for local epochs and local batch size. Our method adapts the decay of SGD updates during the training process, akin to decaying learning rates in classical optimization. Particularly, by adaptively reducing local epochs and increasing local batch size using their ongoing values and the CR as the model progresses, our method enhances convergence speed without compromising accuracy, specifically by effectively addressing challenges posed by non-IID data. We provide theoretical results of the benefits of the dynamic decay of SGD updates in FL scenarios. We demonstrate our method’s consistent outperformance regarding the global model’s communication speedup and convergence behavior through comprehensive experiments.
Loading