Channel Permutations for N:M SparsityDownload PDF

21 May 2021, 20:46 (edited 08 Nov 2021)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: network pruning, sparsity, N:M, structured sparsity, inference
  • TL;DR: N:M structured sparsity is accelerated on hardware but can fail to maintain accuracy in some cases, so we correct this accuracy loss with an offline channel permutation step.
  • Abstract: We introduce channel permutations as a method to maximize the accuracy of N:M sparse networks. N:M sparsity requires N out of M consecutive elements to be zero and has been shown to maintain accuracy for many models and tasks with a simple prune and fine-tune workflow. By permuting weight matrices along their channel dimension and adjusting the surrounding layers appropriately, we demonstrate accuracy recovery for even small, parameter-efficient networks, without affecting inference run-time. We also present both a quality metric to simplify judging permutations as well as efficient methods to search for high-quality permutations, including two optimizations to escape local minima. Finally, we share an ablation study to show the importance of each part of our search algorithm, experimental results showing correlation between our quality metric and final network accuracy, improved sparse network accuracy using our techniques with insignificant overhead to training time, and the transformation of unstructured to structured sparse workloads. Code to use these techniques when generating a 2:4 sparse network is available at https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity
17 Replies

Loading