Gradient-based Dynamic Sparse Training with Adaptive Rewinding

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dynamic Sparse Training, Adaptive Rewinding, Efficient Deep Learning, Pruning, and Scalability
Abstract: Deep neural networks (DNNs) deliver state-of-the-art performance across domains but impose prohibitive computational and memory costs. Pruning mitigates this challenge by removing unimportant parameters, yet conventional post-training pruning and reset-to-initial sparse training approaches incur high retraining costs or degrade performance on large models. To improve stability, prior post-training work suggests rewinding weights to intermediate checkpoints, though at the expense of costly offline analysis. We propose GDSTAR, a Gradient-based Dynamic Sparse Training framework with Adaptive Rewinding that supports models of different sizes and complexities without offline retraining. During training, GDSTAR (1) dynamically identifies stable rewind points using the Frobenius norm of gradients, (2) selects weights for pruning using accumulated gradient magnitudes, and (3) ensures stable optimization using a controlled pruning rate with exponential decay. Experiments across diverse DNNs and datasets show the efficiency and scalability of GDSTAR, which achieves up to 96% sparsity while maintaining accuracy, with only a 0.94% average drop compared to dense models. Compared to the state-of-the-art sparse training approach, GDSTAR improves accuracy by an average of 0.72% (up to 2.13%), under the same sparsity ratios.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 5739
Loading