BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training

Chenyi yang; Wenjie Nie; Yuxin Zhang; Yuhang Wu; Xiawu Zheng; GUANNAN JIANG; Rongrong Ji

BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training

Chenyi yang, Wenjie Nie, Yuxin Zhang, Yuhang Wu, Xiawu Zheng, GUANNAN JIANG, Rongrong Ji

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce BAME, a novel N:M sparse training method that achieves state-of-the-art performance with far less training FLOPs.

Abstract: N:M sparsity stands as a progressively important tool for DNN compression, achieving practical speedups by stipulating at most N non-zero components within M sequential weights. Unfortunately, most existing works identify the N:M sparse mask through dense backward propagation to update all weights, which incurs exorbitant training costs. In this paper, we introduce BAME, a method that maintains consistent sparsity throughout the N:M sparse training process. BAME perpetually keeps both sparse forward and backward propagation, while iteratively performing weight pruning-and-regrowing within designated weight blocks to tailor the N:M mask. These blocks are selected through a joint assessment based on accumulated mask oscillation frequency and expected loss reduction of mask adaptation, thereby ensuring stable and efficient identification of the optimal N:M mask. Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse ResNet-50 on ImageNet, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. Code is released at \url{https://github.com/BAME-xmu/BAME}

Lay Summary: Deep neural networks often use a technique called N:M sparsity to reduce their size and speed up computations by keeping only a small number of important weights (N) in each group of M weights. However, current methods require expensive training processes that update all weights, even the unimportant ones. We introduce BAME, a new approach that keeps the network sparse from start to finish. Instead of updating all weights, BAME continuously prunes and regrows weights within small weight blocks, focusing only on the most useful ones. It selects these blocks based on how often they change and how much they improve performance, making training faster and more stable. Our experiments show that BAME works as well as—or even better than—previous methods while being much more efficient.

Primary Area: Applications->Computer Vision

Keywords: Convolution Neural Networks, N:M Sparsity, Network Pruning

Submission Number: 4913

Loading