Moniqua: Modulo Quantized Communication in Decentralized SGD

Yucheng Lu; Christopher De Sa

Moniqua: Modulo Quantized Communication in Decentralized SGD

Yucheng Lu, Christopher De Sa

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: decentralized training, quantization, communicaiton, stochastic gradient descent

TL;DR: We propose a general method that allows decentralized SGD to use quantized communication.

Abstract: Decentralized stochastic gradient descent (SGD), where parallel workers are connected to form a graph and communicate adjacently, has shown promising results both theoretically and empirically. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires no additional memory, (2) applies to non-convex objectives, and (3) supports biased/linear quantizers. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing less than 4-bits-per-parameter communication without affecting convergence when training VGG16 on CIFAR10.

Original Pdf: pdf

15 Replies

Loading