On Distributed Adaptive Optimization with Gradient CompressionDownload PDF


Sep 29, 2021 (edited Oct 06, 2021)ICLR 2022 Conference Blind SubmissionReaders: Everyone
  • Abstract: We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression is applied to reduce the communication in the gradient transmission process, whose bias is corrected by the tool of error feedback. Our convergence analysis of COMP-AMS shows that such gradient averaging strategy yields same convergence rate as standard AMSGrad, and also exhibits linear speedip effect w.r.t. the number of local workers. Compared with recently proposed protocols on distributed adaptive methods, COMP-AMS is simple and convenient. Numerical experiments are conducted to justify the theoretical findings, and demonstrate that the proposed method can achieve same test accuracy as full-gradient AMSGrad with substantial communication savings. With its simplicity and efficiency,COMP-AMS can serve as a useful distributed training framework for adaptive methods.
0 Replies