Learning N:M  Fine-grained Structured Sparse Neural Networks From Scratch

Aojun Zhou; Yukun Ma; Junnan Zhu; Jianbo Liu; Zhijie Zhang; Kun Yuan; Wenxiu Sun; Hongsheng Li

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: sparsity, efficient training and inference.

Abstract: Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple individual weights distributed across the neural network, and structured coarse-grained sparsity which prunes blocks of sub-networks of a neural network. Fine-grained sparsity can achieve a high compression ratio but is not hardware friendly and hence receives limited speed gains. On the other hand, coarse-grained sparsity cannot simultaneously achieve both apparent acceleration on modern GPUs and decent performance. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network, which can maintain the advantages of both unstructured fine-grained sparsity and structured coarse-grained sparsity simultaneously on specifically designed GPUs. Specifically, a 2 : 4 sparse network could achieve 2× speed-up without performance drop on Nvidia A100 GPUs. Furthermore, we propose a novel and effective ingredient, sparse-refined straight-through estimator (SR-STE), to alleviate the negative influence of the approximated gradients computed by vanilla STE during optimization. We also define a metric, Sparse Architecture Divergence (SAD), to measure the sparse network’s topology change during the training process. Finally, We justify SR-STE’s advantages with SAD and demonstrate the effectiveness of SR-STE by performing comprehensive experiments on various tasks. Anonymous code and model will be at available at https://github.com/anonymous-NM-sparsity/NM-sparsity.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: a simple yet universal recipe to learn N:M sparse neural networks from scratch

Code: [![github](/images/github_icon.svg) anonymous-NM-sparsity/NM-sparsity](https://github.com/anonymous-NM-sparsity/NM-sparsity) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=K9bw7vqp_s)

Data: [FlyingChairs](https://paperswithcode.com/dataset/flyingchairs), [ImageNet](https://paperswithcode.com/dataset/imagenet), [MS COCO](https://paperswithcode.com/dataset/coco)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/learning-n-m-fine-grained-structured-sparse/code)

12 Replies

Loading