Stochastic Gradient Coding for Flexible Straggler Mitigation in Distributed Learning

Rawad Bitar, Mary Wootters, Salim El Rouayheb

Published: 2019, Last Modified: 16 May 2025ITW 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We consider distributed gradient descent in the presence of stragglers. Recent work on gradient coding and approximate gradient coding have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are slow or non-responsive. In this work we propose a new type of approximate gradient coding which we call Stochastic Gradient Coding (SGC). The idea of SGC is very simple: we distribute data points redundantly to workers according to a good combinatorial design. We prove that the convergence rate of SGC mirrors that of batched Stochastic Gradient Descent (SGD) for the l2 loss function, and show how the convergence rate can improve with the redundancy. We show empirically that SGC requires a small amount of redundancy to handle a large number of stragglers and that it can outperform existing approximate gradient codes when the number of stragglers is large.