Dynamic Probabilistic Pruning: Training sparse networks based on stochastic and dynamic masking

Lizeth Gonzalez Carabarin; Iris A.M. Huijben; Bastiaan S. Veeling; Alexandre Schmid; Ruud Van Sloun

Dynamic Probabilistic Pruning: Training sparse networks based on stochastic and dynamic masking

Lizeth Gonzalez Carabarin, Iris A.M. Huijben, Bastiaan S. Veeling, Alexandre Schmid, Ruud Van Sloun

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: deep probabilistic subsampling, sparse deep learning, structured pruning, hardware-oriented pruning

Abstract: Deep Learning (DL) models are known to be heavily over-parametrized, resulting in a large memory footprint and power consumption. This hampers the use of such models in hardware-constrained edge technologies such as wearables and mobile devices. Model compression during training can be achieved by promoting sparse network structures both through weight regularization and by leveraging dynamic pruning methods. State-of-the-art pruning methods are however mostly magnitude-based which impedes their use in e.g. binary settings. Importantly, most of the pruning methods do not provide a structural sparsity, resulting in an inefficient memory allocation and access for hardware implementations. In this paper, we propose a novel dynamic pruning solution that we term Dynamic Probabilistic Pruning (DPP). DPP leverages Gumbel top-K sampling to select subsets of weights during training, which enables exploring which weights are most relevant. Our approach allows for setting an explicit per-neuron layer-wise sparsity level and structural pruning across weights and feature maps, without relying on weight magnitude heuristics. Relevantly, our method generates a hardware-oriented structural sparsity for fully-connected and convolutional layers that facilitates memory allocation and access, in contrast with conventional unstructured pruning. We show that DPP achieves competitive sparsity levels and classification accuracy on MNIST and CIFAR-10, CIFAR-100 datasets compared to a state-of-the-art baseline for various DL architectures, while respecting per-neuron sparsity constraints.

One-sentence Summary: Hardware-oriented dynamic probabilistic pruning method that learns to generate structured sparsity for fully-connected and convolutional layers

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=3dDyG9oCyr

15 Replies

Loading