Abstract: Along with the practical success of deep neural networks, several theories have been proposed to explain their excellent generalization behaviour. One such theory is information bottleneck, using mutual information (MI) as a measure to understand the learning dynamics of these black-box models. However, estimating MI in high dimensions and in deterministic settings is problematic, often resulting in widely varying estimates for different estimators. This paper takes an alternative approach to analyze the behaviour of deep models by using a recently proposed information measure known as sliced mutual information (SMI), which is more computationally efficient to estimate than MI in high dimensions. We theoretically connect SMI to the classifier margin, thereby showcasing its ability to encode geometric properties of the feature distribution. We also study SMI empirically and demonstrate that the SMI between the hidden layer and the labels encodes information about the network’s ability to predict labels correctly.
0 Replies
Loading