TL;DR: We propose a new ''density-diversity penalty'' to fully-connected layers to get significantly high sparsity and low diversity trained matrices, while keeping the performance the same.
Abstract: Deep models have achieved great success on a variety of challenging tasks. How- ever, the models that achieve great performance often have an enormous number of parameters, leading to correspondingly great demands on both computational and memory resources, especially for fully-connected layers. In this work, we propose a new “density-diversity penalty” regularizer that can be applied to fully-connected layers of neural networks during training. We show that using this regularizer results in significantly fewer parameters (i.e., high sparsity), and also significantly fewer distinct values (i.e., low diversity), so that the trained weight matrices can be highly compressed without any appreciable loss in performance. The resulting trained models can hence reside on computational platforms (e.g., portables, Internet-of-Things devices) where it otherwise would be prohibitive.
Keywords: Deep learning