Training Compressed Fully-Connected Networks with a Density-Diversity Penalty

Shengjie Wang; Haoran Cai; Jeff Bilmes; William Noble

Training Compressed Fully-Connected Networks with a Density-Diversity Penalty

Shengjie Wang, Haoran Cai, Jeff Bilmes, William Noble

Published: 06 Feb 2017, Last Modified: 05 May 2023ICLR 2017 PosterReaders: Everyone

Abstract: Deep models have achieved great success on a variety of challenging tasks. How- ever, the models that achieve great performance often have an enormous number of parameters, leading to correspondingly great demands on both computational and memory resources, especially for fully-connected layers. In this work, we propose a new “density-diversity penalty” regularizer that can be applied to fully-connected layers of neural networks during training. We show that using this regularizer results in significantly fewer parameters (i.e., high sparsity), and also significantly fewer distinct values (i.e., low diversity), so that the trained weight matrices can be highly compressed without any appreciable loss in performance. The resulting trained models can hence reside on computational platforms (e.g., portables, Internet-of-Things devices) where it otherwise would be prohibitive.

TL;DR: We propose a new ''density-diversity penalty'' to fully-connected layers to get significantly high sparsity and low diversity trained matrices, while keeping the performance the same.

Conflicts: washington.edu

Keywords: Deep learning

22 Replies

Loading