Abstract: Deep network compression seeks to reduce the number of parameters in the network while maintaining a certain level of performance. Deep network distillation seeks to train a smaller network that matches soft-max performance of a larger network. While both regimes have led to impressive performance for their respective goals, neither provide insight into the importance of a given layer in the original model, which is useful if we are to improve our understanding of these highly parameterized models. In this paper, we present the concept of deep net triage, which individually assesses small blocks of convolution layers to understand their collective contribution to the overall performance, which we call \emph{criticality}. We call it triage because we assess this criticality by answering the question: what is the impact to the health of the overall network if we compress a block of layers into a single layer.
We propose a suite of triage methods and compare them on problem spaces of varying complexity. We ultimately show that, across these problem spaces, deep net triage is able to indicate the of relative importance of different layers. Surprisingly, our local structural compression technique also leads to an improvement in overall accuracy when the final model is fine-tuned globally.
TL;DR: We seek to understand learned representations in compressed networks via an experimental regime we call deep net triage
Keywords: Deep Compression, Deep Learning, Parent-Teacher Networks
3 Replies
Loading