Sparse training of neural networks based on multilevel mirror descent

Sparse training of neural networks based on multilevel mirror descent

ICLR 2026 Conference Submission18494 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sparse training, Mirror descent, Bregman iterations, Multilevel optimization, Sparse neural networks

Abstract: We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror descent that exploits the naturally incurred sparsity by alternating between periods of static and dynamic sparsity pattern updates. The key idea is to combine sparsity-inducing Bregman iterations with adaptive freezing of the network structure to enable efficient exploration of the sparse parameter space while maintaining sparsity. We provide convergence guaranties by embedding our method in a multilevel optimization framework. Furthermore, we empirically show that our algorithm can produce highly sparse and accurate models on standard benchmarks. We also show that the theoretical number of FLOPs compared to SGD training can be reduced from 38\% for standard Bregman iterations to 6\% for our method while maintaining test accuracy.

Supplementary Material: zip

Primary Area: optimization

Submission Number: 18494

Loading