Variational Network Quantization


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We formulate the preparation of a neural network for pruning and few-bit quantization as a variational inference problem. We introduce a quantizing prior that leads to a multi-modal, sparse posterior distribution over weights and further derive a differentiable KL approximation for this prior. After training with Variational Network Quantization (VNQ), weights can be replaced by deterministic quantization values with small to negligible loss of task accuracy (including pruning by setting weights to 0). Our method does not require fine-tuning after quantization. We show results for ternary quantization on LeNet-5 (MNIST) and DenseNet-121 (CIFAR-10).
  • TL;DR: We quantize and prune neural network weights using variational Bayesian inference with a multi-modal, sparsity inducing prior.
  • Keywords: Network compression, variational inferene, ternary network, Bayesian neural network, weight quantization, weight sharing