Learning Optimizers for Local SGD
Student Author Indication: Yes
Keywords: learned optimization, local sgd, communication-efficient distributed learning, meta-learning
TL;DR: We demonstrate that communication-efficient distributed learning algorithms can be meta-learned.
Abstract: Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art optimizers for deep learning. In this work, we incorporate local optimizers that compute multiple updates into a learned optimization framework, allowing to meta-learn potentially more efficient local SGD algorithms. Our results demonstrate that local learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. We show that the learned optimizers can generalize to new datasets and architectures, demonstrating the potential of learned optimizers for improving communication-efficient distributed learning.
Submission Number: 34