Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up

Paul Mangold; Alain Oliviero Durmus; Aymeric Dieuleveut; Eric Moulines

Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up

Paul Mangold, Alain Oliviero Durmus, Aymeric Dieuleveut, Eric Moulines

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a new framework for analysis Scaffold with stochastic gradients, showing that it has linear speed-up and giving an expression of its bias.

Abstract: This paper proposes a novel analysis for the Scaffold algorithm, a popular method for dealing with data heterogeneity in federated learning. While its convergence in deterministic settings—where local control variates mitigate client drift—is well established, the impact of stochastic gradient updates on its performance is less understood. To address this problem, we first show that its global parameters and control variates define a Markov chain that converges to a stationary distribution in the Wasserstein distance. Leveraging this result, we prove that Scaffold achieves linear speed-up in the number of clients up to higher-order terms in the step size. Nevertheless, our analysis reveals that Scaffold retains a higher-order bias, similar to FedAvg, that does not decrease as the number of clients increases. This highlights opportunities for developing improved stochastic federated learning algorithms.

Lay Summary: Federated learning is a way to train machine learning models across many devices (like smartphones) without needing to gather all their data in one place. A popular method in federated learning is Scaffold, which allows to learn correctly while reducing the number of communications. While Scaffold is well studied with exact updates, it’s less clear how it performs when updates are based on noisy or approximate information, which often happens in practice. This paper takes a fresh look at Scaffold under these realistic conditions. We show that, over time, the shared model and Scaffold's control variables settle into a stable state. Thanks to this insight, we prove that adding more devices helps the model learn faster, up to a certain limit. We also highlight that Scaffold still suffers from another bias that doesn’t go away, no matter how many devices are added. This suggests there’s still room to improve federated learning methods to make them more accurate and efficient in real-world settings.

Link To Code: https://github.com/pmangold/scaffold-speed-up

Primary Area: Optimization->Convex

Keywords: optimization, federated learning, stochastic gradient descent, stochastic optimization, heterogeneity

Submission Number: 7223

Loading