Clapping: Removing Per-sample Storage for Pipeline Parallel Distributed Optimization with Communication Compression
Keywords: distributed optimization, communication compression, pipeline parallel optimization, lazy sampling
TL;DR: We present a communication compression framework that achieve the convergence without relying on unbiased gradient assumption, sample-wise memory overhead, and multiple epoch training.
Abstract: Pipeline-parallel distributed optimization is essential for large-scale machine learning but is challenged by significant communication overhead from transmitting high-dimensional activations and gradients between workers. Existing approaches often depend on impractical unbiased gradient assumptions or incur sample-size memory overhead. This paper introduces **Clapping**, a **C**ommunication compression algorithm with **LA**zy sam**P**ling for **P**ipeline-parallel learn**ING**. Clapping adopts a lazy sampling strategy that reuses data samples across steps, breaking sample-wise memory barrier and supporting convergence in few-epoch or even online training regimes. Clapping comprises two variants including **Clapping-FC** and **Clapping-FU**, both of which achieve convergence without relying on unbiased gradient assumption, effectively addressing compression error propagation in multi-worker settings. Among them, Clapping-FU has an asymptotic convergence rate of $\mathcal{O}(1/\sqrt{T})$. Numerical experiments validate the performance of \ours across different learning tasks.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 3569
Loading