REPRESENTATION COMPRESSION AND GENERALIZATION IN DEEP NEURAL NETWORKS

Ravid Shwartz-Ziv; Amichai Painsky; Naftali Tishby

REPRESENTATION COMPRESSION AND GENERALIZATION IN DEEP NEURAL NETWORKS

Ravid Shwartz-Ziv, Amichai Painsky, Naftali Tishby

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Understanding the groundbreaking performance of Deep Neural Networks is one of the greatest challenges to the scientific community today. In this work, we introduce an information theoretic viewpoint on the behavior of deep networks optimization processes and their generalization abilities. By studying the Information Plane, the plane of the mutual information between the input variable and the desired label, for each hidden layer. Specifically, we show that the training of the network is characterized by a rapid increase in the mutual information (MI) between the layers and the target label, followed by a longer decrease in the MI between the layers and the input variable. Further, we explicitly show that these two fundamental information-theoretic quantities correspond to the generalization error of the network, as a result of introducing a new generalization bound that is exponential in the representation compression. The analysis focuses on typical patterns of large-scale problems. For this purpose, we introduce a novel analytic bound on the mutual information between consecutive layers in the network. An important consequence of our analysis is a super-linear boost in training time with the number of non-degenerate hidden layers, demonstrating the computational benefit of the hidden layers.

Keywords: Deep neural network, information theory, training dynamics

TL;DR: Introduce an information theoretic viewpoint on the behavior of deep networks optimization processes and their generalization abilities

8 Replies

Loading