Clapping: Removing Per-sample Storage for Pipeline Parallel Distributed Optimization with Communication Compression

10 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: distributed optimization, communication compression, pipeline parallel optimization, lazy sampling
TL;DR: We present a communication compression framework that achieve the convergence without relying on unbiased gradient assumption, sample-wise memory overhead, and multiple epoch training.
Abstract: Pipeline-parallel distributed optimization is essential for large-scale machine learning but is challenged by significant communication overhead from transmitting high-dimensional activations and gradients between workers. Existing approaches often depend on impractical unbiased gradient assumptions or incur sample-size memory overhead. This paper introduces **Clapping**, a **C**ommunication compression algorithm with **LA**zy sam**P**ling for **P**ipeline-parallel learn**ING**. Clapping adopts a lazy sampling strategy that reuses data samples across steps, breaking sample-wise memory barrier and supporting convergence in few-epoch or even online training regimes. Clapping comprises two variants including **Clapping-FC** and **Clapping-FU**, both of which achieve convergence without relying on unbiased gradient assumption, effectively addressing compression error propagation in multi-worker settings. Among them, Clapping-FU has an asymptotic convergence rate of $\mathcal{O}(1/\sqrt{T})$. Numerical experiments validate the performance of \ours across different learning tasks.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 3569
Loading