LEGACY: A Lightweight Dynamic Gradient Compression Strategy for Distributed Deep Learning

ICLR 2026 Conference Submission12705 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Distributed Computing, Compressed Communication, Federated Learning
TL;DR: In this work, we propose a lightweight and efficient dynamic gradient compression method that changes the compression ratio of each layer based on the layer size and the training iteration.
Abstract: Distributed learning has achieved remarkable success in training deep neural networks (DNNs) on large datasets, but the communication bottleneck limits its scalability. Various compression techniques have been proposed to alleviate this limitation; however, they either use fixed parameters throughout training or rely on complex and computationally intensive methods to adapt compression parameters. Instead of the hard-to-tune hyperparameters required by adaptive compressors, in this paper we investigate the impact of two fundamental factors in DNN training—the layer size of the networks and their training phases—to design a simple yet efficient dynamic scheduler for any compressor to guide the selection of compression parameters. We present a **L**ightweight **E**fficient **G**r**A**dient **C**ompression strategy**Y** or LEGACY, which, in theory, can work with any compression technique to produce a simple dynamic counterpart. We benchmark LEGACY on distributed and federated training, involving six different DNN architectures across large and challenging datasets, including ImageNet and WikiText-103. On ImageNet-1K, with equivalent average data volume, LEGACY's dynamic compression strategies improve the Top-1 accuracy of ResNet-50 by 7-11% compared to uniform Top-0.1% compression, while on WikiText-103, the layer-based dynamic strategy reduces the perplexity of Transformer-XL by ~26% relative to the same baseline. In addition, we evaluate LEGACY under constrained and federated settings, and demonstrate that it scales effectively to a 100-worker configuration while maintaining strong accuracy under aggressive compression. We publish anonymized code at: https://github.com/LEGACY-compression/LEGACY.
Primary Area: optimization
Submission Number: 12705
Loading