Asynchronous Decentralized SGD for Non-Convex Optimization via a Block-Coordinate Descent Lens

Asynchronous Decentralized SGD for Non-Convex Optimization via a Block-Coordinate Descent Lens

ICLR 2026 Conference Submission15306 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Asynchronous Optimization, Decentralized Optimization, Block-Coordinate Descent, SGD, Non-Convex Optimization

Abstract: Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds, unpredictable communication delays, and diverse local data distributions. This paper introduces a refined model of Asynchronous Decentralized Stochastic Gradient Descent (ADSGD) under practical assumptions of bounded computation and communication times. To analyze its convergence for non‑convex objectives, we first study Asynchronous Stochastic Block Coordinate Descent (ASBCD) as a theoretical tool, and employ a \textit{double‑stepsize technique} to handle the interplay between stochasticity and asynchrony. This approach allows us to establish convergence of ADSGD under \textit{computation‑delay‑independent} step sizes, without assuming bounded data heterogeneity. Empirical results show that ADSGD is practically robust even under extreme data heterogeneity and can be multiple times faster than existing methods in wall‑clock convergence. With its simplicity, efficiency in memory and communication, and resilience to delays, ADSGD is well‑suited for real‑world decentralized learning tasks.

Primary Area: optimization

Submission Number: 15306

Loading