- Keywords: Learning Representation, Information Bottleneck, Model Compression
- Abstract: In this paper, we first investigate the representation learned in convolutional neural networks at the filter-wise granularity by computing the mutual information between channels of higher conv-layers and input or output variables. Then we identify the approximate minimal sufficient statistics of learned representation based on the information bottleneck principle and propose a novel approach to automatically compress a neural network. This approach prunes a large trained network structurally and automatically by extracting relevant information backpropagately layer by layer in the post-training phase. Our experimental results match the two fundamental data processing inequalities, and prove that mutual information is a fundamental element for examining the efficiency of the internal representations at the filter-wise granularity. In addition, using the information bottleneck principle to interpret structure compression is an efficient method to get closer to the information theoretic limit of compression/prediction problem. Finally, from the observed results, we argue that compression is causally linked to the improved generalization performance.