Achieving Linear Speedup and Near-Optimal Complexity for Decentralized Optimization over Row-stochastic Networks
Abstract: A key challenge in decentralized optimization is determining the optimal convergence rate and designing algorithms to achieve it. While this problem has been extensively addressed for doubly-stochastic and column-stochastic mixing matrices, the row-stochastic scenario remains unexplored. This paper bridges this gap by introducing effective metrics to capture the influence of row-stochastic mixing matrices and establishing the first convergence lower bound for decentralized learning over row-stochastic networks. However, existing algorithms fail to attain this lower bound due to two key issues: deviation in the descent direction caused by the adapted gradient tracking (GT) and instability introduced by the Pull-Diag protocol. To address descent deviation, we propose a novel analysis framework demonstrating that Pull-Diag-GT achieves linear speedup—the first such result for row-stochastic decentralized optimization. Moreover, by incorporating a multi-step gossip (MG) protocol, we resolve the instability issue and attain the lower bound, achieving near-optimal complexity for decentralized optimization over row-stochastic networks.
Lay Summary: In many modern systems, groups of computers or devices work together to solve problems without relying on a central coordinator. This is known as decentralized optimization. A key challenge in this area is understanding how quickly these systems can reach a good solution and designing methods that do this as efficiently as possible.
Most past research has focused on specific ways these devices share information, but one important case has not been well studied. This is the case where each device only considers its own incoming data, which requires the use of a row-stochastic mixing matrix. This paper addresses this gap by introducing new ways to measure how these matrices affect performance and by proving the first lower limit on how fast learning can happen in such systems.
We find that existing algorithms fail to reach this lower bound due to two key issues: deviation in the descent direction caused by the adapted gradient tracking (GT) and instability introduced by the Pull-Diag protocol. To solve the first problem, we introduce a new way to analyze the method and show that Pull-Diag-GT can speed up learning proportionally to the number of devices. This is the first result of its kind for row-stochastic decentralized optimization. Moreover, by incorporating a multi-step gossip (MG) protocol, we resolve the instability issue and attain the lower bound, achieving near-optimal complexity for decentralized optimization over row-stochastic networks.
Primary Area: Optimization->Large Scale, Parallel and Distributed
Keywords: decentralized optimization, stochastic optimization, non-convex, row stochastic network
Submission Number: 6528
Loading