Abstract: Decentralized federated learning (DFL) has emerged as a promising paradigm for distributed machine learning over edge nodes (i.e., workers) without relying on a centralized parameter server. Most existing DFL researches rely on synchronous communication among workers. However, due to edge heterogeneity and dynamic network conditions, synchronous DFL mechanisms may suffer from inefficient model training and poor scalability. Meanwhile, the existing asynchronous DFL (ADFL) mechanisms present the challenge of stale models among workers, leading to diminished training quality, especially on Non-IID data. In this paper, we propose a novel staleness-aware ADFL (SA-ADFL) mechanism, aiming to realize a trade-off between model training efficiency and quality by dynamic staleness control. Specifically, we provide rigorous theoretical proof for SA-ADFL and formulate the worker scheduling problem to minimize total model training time under flexible long-term staleness constraints. Then we decompose the original round-coupled problem into a series of single-round sub-problems by leveraging the Lyapunov optimization, enabling efficient worker selection to minimize training time in each round while ensuring staleness queue stability. Experimental results demonstrate that our SA-ADFL accelerates model training by approximately 52.9% while maintaining equivalent model training accuracy compared with the state-of-the-art mechanisms.
Loading