vqSGD: Vector Quantized Stochastic Gradient Descent

Venkata Gandikota, Daniel Kane, Raj Kumar Maity, Arya Mazumdar

Published: 01 Jan 2022, Last Modified: 22 Sept 2023IEEE Trans. Inf. Theory 2022Readers: Everyone

Abstract: In this work, we present a family of vector quantization schemes <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">vqSGD</i> (Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization. In the process we derive the following fundamental information theoretic fact: <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\Theta \left({\frac {d}{R^{2}}}\right)$ </tex-math></inline-formula> bits are necessary and sufficient (up to an additive <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(\log d)$ </tex-math></inline-formula> term) to describe an unbiased estimator <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\hat{\boldsymbol {g}}(\boldsymbol {g})$ </tex-math></inline-formula> for any <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\boldsymbol {g}$ </tex-math></inline-formula> in the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> -dimensional unit sphere, under the constraint that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\| \hat{\boldsymbol {g}}(\boldsymbol {g})\|_{2}\le R$ </tex-math></inline-formula> almost surely, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$R > 1$ </tex-math></inline-formula> . In particular, we consider a randomized scheme based on the convex hull of a point set, that returns an unbiased estimator of a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> -dimensional gradient vector with almost surely bounded norm. We provide multiple efficient instances of our scheme, that are near optimal, and require <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$o(d)$ </tex-math></inline-formula> bits of communication at the expense of tolerable increase in error. The instances of our quantization scheme are obtained using well-known families of binary error-correcting codes and provide a smooth tradeoff between the communication and the estimation error of quantization. Furthermore, we show that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">vqSGD</i> also offers automatic privacy guarantees.

0 Replies