Keywords: exchangeability, parameter space symmetry, dynamic pruning, efficient inference
TL;DR: We formalize the symmetry in neural networks using statistical exchangeability, and exploit it to dynamically prune neural networks on a per-input basis.
Abstract: Modern neural networks (NN) contain an ever-growing number of parameters, substantially increasing the memory and computational cost of inference. Researchers have explored various ways to reduce the inference cost of NNs by reducing the model size before deployment and dynamically pruning the inference computation at runtime. In this work, we present ExPrune, a general, dynamic pruning optimization that enables multi-granularity partial computation on a per-input basis. ExPrune requires no change to the model architecture or the training algorithm. ExPrune is based on our theoretical results that the relationship between certain model parameters and intermediate values can be described by a statistical property called \textit{exchangeability}. By identifying exchangeable parameters and values in the model, we are able to first partially evaluate the network, analyze the statistics of the partial results, and make pruning decisions on the fly.
Because ExPrune is theory grounded, it generalizes across model architectures in different problem domains. We evaluate ExPrune on one computer vision models, one graph model and one language model. ExPrune provides 10.98--17.33\% reduction in FLOPs with negligible accuracy drop and 21.61--27.16\% reduction in FLOPs with at most 1\% accuracy drop. We also demonstrate that ExPrune composes with static magnitude pruning. On models that have been aggressively statically pruned, ExPrune still provides additional 10.24--11.11\% reduction in FLOPs with negligible accuracy drop and 13.91--14.39\% reduction in FLOPs with at most 1\% accuracy drop.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 13259
Loading