Tight Analysis of Decentralized SGD: a Markov Chain Perspective

Lucas Versini; Paul Mangold; Aymeric Dieuleveut

Tight Analysis of Decentralized SGD: a Markov Chain Perspective

Lucas Versini, Paul Mangold, Aymeric Dieuleveut

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We study Decentralized Stochastic Gradient Descent with constant step size, proving convergence to a stationary distribution and deriving first-order expressions for its bias and variance, revealing a linear speed-up in the number of agents.

Abstract: We propose a novel analysis of the Decentralized Stochastic Gradient Descent (DSGD) algorithm with constant step size, interpreting the iterates of the algorithm as a Markov chain. We show that DSGD converges to a stationary distribution, with its bias, to first order, decomposable into two components: one due to decentralization (growing with the graph's spectral gap and heterogeneity) and one due to stochasticity. Remarkably, the variance of local parameters is, at the first-order, inversely proportional to the number of agents, regardless of the network topology and even when clients' iterates are not averaged at the end. As a consequence of our analysis, we obtain non-asymptotic convergence bounds for clients' local iterates, confirming that DSGD has linear speed-up in the number of clients, and that the network topology only impacts higher-order terms.

Code Dataset Promise: Yes

Code Dataset Url: https://github.com/lucas-versini/DSGD

Signed Copyright Form: pdf

Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.

Submission Number: 1425

Loading