Enhancing Parallelism in Decentralized Stochastic Convex Optimization

Ofri Eisen; Ron Dorfman; Kfir Yehuda Levy

Enhancing Parallelism in Decentralized Stochastic Convex Optimization

Ofri Eisen, Ron Dorfman, Kfir Yehuda Levy

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Decentralized learning has emerged as a powerful approach for handling large datasets across multiple machines in a communication-efficient manner. However, such methods often face scalability limitations, as increasing the number of machines beyond a certain point negatively impacts convergence rates. In this work, we propose *Decentralized Anytime SGD*, a novel decentralized learning algorithm that significantly extends the critical parallelism threshold, enabling the effective use of more machines without compromising performance. Within the stochastic convex optimization (SCO) framework, we establish a theoretical upper bound on parallelism that surpasses the current state-of-the-art, allowing larger networks to achieve favorable statistical guarantees and closing the gap with centralized learning in highly connected topologies.

Lay Summary: When training machine learning models on large datasets, we often need to split the work across multiple computers to make the process faster and more manageable. However, there is a frustrating catch: adding more machines beyond a certain point actually makes the learning process less efficient. We developed a new training algorithm called Decentralized Anytime SGD that increases the number of machines you can use before hitting this performance wall. Our approach allows larger networks of computers to work together more efficiently by improving how they share information and coordinate their learning progress. Through mathematical analysis, we proved that our method can handle significantly larger networks of machines while still maintaining good learning performance. This improvement brings decentralized learning much closer to the performance you would get from a single, centralized system, but with all the practical benefits of distributed computing. Our work allows organizations to use more machines effectively for training AI models, improving the scalability of distributed machine learning.

Primary Area: Optimization->Large Scale, Parallel and Distributed

Keywords: Decentralized Learning, Stochastic Convex Optimization

Submission Number: 6268

Loading