Learning Optimizers for Local SGD

Charles-Étienne Joseph; Benjamin Thérien; Abhinav Moudgil; Boris Knyazev; Eugene Belilovsky

Learning Optimizers for Local SGD

Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

Published: 28 Oct 2023, Last Modified: 22 Nov 2023FL@FM-NeurIPS’23 PosterEveryoneRevisionsBibTeX

Student Author Indication: Yes

Keywords: learned optimization, local sgd, communication-efficient distributed learning, meta-learning

TL;DR: We demonstrate that communication-efficient distributed learning algorithms can be meta-learned.

Abstract: Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art optimizers for deep learning. In this work, we incorporate local optimizers that compute multiple updates into a learned optimization framework, allowing to meta-learn potentially more efficient local SGD algorithms. Our results demonstrate that local learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. We show that the learned optimizers can generalize to new datasets and architectures, demonstrating the potential of learned optimizers for improving communication-efficient distributed learning.

Submission Number: 34

Loading