Coordinating Distributed Example Orders for Provably Accelerated Training

A. Feder Cooper; Wentao Guo; Khiem Pham; Tiancheng Yuan; Charlie F. Ruan; Yucheng Lu; Christopher De Sa

Coordinating Distributed Example Orders for Provably Accelerated Training

A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa

Published: 21 Sept 2023, Last Modified: 18 Jan 2024NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: permuted example ordering, distributed training, scalable training, herding

TL;DR: Provably faster convergence in distributing training via coordinated example ordering

Abstract: Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.

Supplementary Material: pdf

Submission Number: 5087

Loading