Natural Compression  for Distributed Deep Learning

Samuel Horváth; Chen-Yu Ho; Ludovit Horváth; Atal Narayan Sahu; Marco Canini; Peter Richtarik

Natural Compression for Distributed Deep Learning

Samuel Horváth, Chen-Yu Ho, Ludovit Horváth, Atal Narayan Sahu, Marco Canini, Peter Richtarik

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Abstract: Modern deep learning models are often trained in parallel over a collection of distributed machines to reduce training time. In such settings, communication of model updates among machines becomes a significant performance bottleneck and various lossy update compression techniques have been proposed to alleviate this problem. In this work, we introduce a new, simple yet theoretically and practically effective compression technique: {\em natural compression ($C_{nat}$)}. Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two, which can be computed in a ``natural'' way by ignoring the mantissa. We show that compared to no compression, $C_{nat}$ increases the second moment of the compressed vector by not more than the tiny factor $\nicefrac{9}{8}$, which means that the effect of $C_{nat}$ on the convergence speed of popular training algorithms, such as distributed SGD, is negligible. However, the communications savings enabled by $C_{nat}$ are substantial, leading to {\em $3$-$4\times$ improvement in overall theoretical running time}. For applications requiring more aggressive compression, we generalize $C_{nat}$ to {\em natural dithering}, which we prove is {\em exponentially better} than the common random dithering technique. Our compression operators can be used on their own or in combination with existing operators for a more aggressive combined effect, and offer new state-of-the-art both in theory and practice.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/natural-compression-for-distributed-deep/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=2L59eB-P_f

11 Replies

Loading