Training Compressed Fully-Connected Networks with a Density-Diversity Penalty

Shengjie Wang, Haoran Cai, Jeff Bilmes, William Noble

Nov 04, 2016 (modified: Mar 04, 2017) ICLR 2017 conference submission readers: everyone
  • Abstract: Deep models have achieved great success on a variety of challenging tasks. How- ever, the models that achieve great performance often have an enormous number of parameters, leading to correspondingly great demands on both computational and memory resources, especially for fully-connected layers. In this work, we propose a new “density-diversity penalty” regularizer that can be applied to fully-connected layers of neural networks during training. We show that using this regularizer results in significantly fewer parameters (i.e., high sparsity), and also significantly fewer distinct values (i.e., low diversity), so that the trained weight matrices can be highly compressed without any appreciable loss in performance. The resulting trained models can hence reside on computational platforms (e.g., portables, Internet-of-Things devices) where it otherwise would be prohibitive.
  • TL;DR: We propose a new ''density-diversity penalty'' to fully-connected layers to get significantly high sparsity and low diversity trained matrices, while keeping the performance the same.
  • Conflicts:
  • Keywords: Deep learning
  • Authorids:,,,