Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability

Michael Crawshaw; Blake Woodworth; Mingrui Liu

Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability

Michael Crawshaw, Blake Woodworth, Mingrui Liu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We present an improved analysis of Local GD for logistic regression that uses large stepsizes to converge faster, despite instability.

Abstract: Existing analysis of Local (Stochastic) Gradient Descent for heterogeneous objectives requires stepsizes $\eta \leq 1/K$ where $K$ is the communication interval, which ensures monotonic decrease of the objective. In contrast, we analyze Local Gradient Descent for logistic regression with separable, heterogeneous data using any stepsize $\eta > 0$. With $R$ communication rounds and $M$ clients, we show convergence at a rate $\mathcal{O}(1/\eta K R)$ after an initial unstable phase lasting for $\widetilde{\mathcal{O}}(\eta K M)$ rounds. This improves upon the existing $\mathcal{O}(1/R)$ rate for general smooth, convex objectives. Our analysis parallels the single machine analysis of Wu et al. (2024) in which instability is caused by extremely large stepsizes, but in our setting another source of instability is large local updates with heterogeneous objectives.

Lay Summary: Machine learning is generally very powerful, but also very expensive in terms of resources such as computer power and data. To mitigate these resource requirements, it is common to train machine learning models in parallel across many devices, such as mobile phones, which helps by leveraging compute power and data from many users. In this paper, we study a classic algorithm for training machine learning models in this distributed manner, and prove that this algorithm can train certain machine learning models much faster than previously understood. Essentially, this acceleration comes from allowing the algorithm to be "unstable", which is usually considered inadvisable, but in our case the instability actually creates acceleration.

Primary Area: Optimization->Large Scale, Parallel and Distributed

Keywords: distributed optimization, convex optimization, logistic regression, federated learning, unstable convergence, edge of stability

Submission Number: 7843

Loading