Differentiable Sparsification for Deep Neural Networks

Yongjin Lee

Differentiable Sparsification for Deep Neural Networks

Yongjin Lee

21 May 2021 (modified: 05 May 2023)NeurIPS 2021 SubmittedReaders: Everyone

Abstract: Deep neural networks have relieved the feature engineering burden on human experts. However, comparable efforts are instead required to determine an effective architecture. In addition, as the sizes of networks have over-grown, a considerable amount of resources is also invested in reducing the sizes. The sparsification of an over-complete model addresses these problems as it removes redundant parameters or connections. In this study, we propose a fully differentiable sparsification method for deep neural networks, which allows parameters to be zero during training with the stochastic gradient descent. Thus, the proposed method can simultaneously learn the sparsified structure and weights of networks in an end-to-end manner, which can be directly applies to modern deep neural networks and imposes minimum overhead to the training process. To the authors' best knowledge, it is the first fully [sub-]differentiable sparsification method that zeroes out components, and it provides a foundation for future structure learning and model compression methods.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: zip

17 Replies

Loading