BinSGDM: Extreme One-Bit Quantization for Communication Efficient Large-Scale Distributed Training Download PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: Distributed Learning, Optimizer, Communication Efficiency
TL;DR: Extreme One-Bit Quantization for Communication Efficient Large-Scale Distributed Training
Abstract: To alleviate the communication bottleneck of large-scale distributed training, a rich body of prior communication-compression optimizers have been proposed. These methods focus mainly on high compression ratio to expect acceleration. However, some recent works pointed out, when running with distributed training frameworks ( \emph{e.g.}, \emph{DistributedDataParallel} in pytorch), these methods may provide no acceleration over the off-the-shelve uncompressed SGD/Adam in the typical settings, due to heavy compression/decompression computation or incompatibility with efficient communication primitives or the requirement of uncompressed warmup at the early stage. For these reasons, we propose a novel extreme one-bit quantization optimizer, dubbed \emph{BinSGDM}. The quantization of \emph{BinSGDM} is computed easily and lightly, and it does not need to resort to uncompressed optimizers for warmup. We also theoretically prove that it can promise the same convergence speed as the original Adam. Moreover, we specially present a hierarchical communication scheme to further lower the communication volume. Extensive experiments are conducted on 8 to 64 GPUs (1 to 8 nodes) for distributed training with \emph{DistributedDataParallel}, and the experimental results demonstrates that \emph{BinSGDM} with the communication scheme can achieve up to {$\bm{2.47 \times}$} speedup for training ResNet-50 and $\bm{6.26\times}$ speedup for training BERT-Base, compared to the full-precision optimizers.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
14 Replies

Loading