SLowcalSGD : Slow Query Points Improve Local-SGD for Stochastic Convex Optimization

Tehila Dahan; Kfir Yehuda Levy

SLowcalSGD : Slow Query Points Improve Local-SGD for Stochastic Convex Optimization

Tehila Dahan, Kfir Yehuda Levy

Published: 25 Sept 2024, Last Modified: 15 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Stochastic Convex Optimization

TL;DR: The first parallel training method that provably benefits over Minibatch-SGD in Convex heterogeneous training scenarios.

Abstract: We consider distributed learning scenarios where $M$ machines interact with a parameter server along several communication rounds in order to minimize a joint objective function. Focusing on the heterogeneous case, where different machines may draw samples from different data-distributions, we design the first local update method that provably benefits over the two most prominent distributed baselines: namely Minibatch-SGD and Local-SGD. Key to our approach is a slow querying technique that we customize to the distributed setting, which in turn enables a better mitigation of the bias caused by local updates.

Primary Area: Optimization (convex and non-convex, discrete, stochastic, robust)

Submission Number: 6659

Loading