- Keywords: Information, Learning Dynamics, PAC-Bayes, Deep Learning
- Abstract: Whatever information a deep neural network has gleaned from past data is encoded in its weights. How this information affects the response of the network to future data is largely an open question. In fact, even how to define and measure information in a network entails some subtleties. We measure information in the weights of a deep neural network as the optimal trade-off between accuracy of the network and complexity of the weights relative to a prior. Depending on the prior, the definition reduces to known information measures such as Shannon Mutual Information and Fisher Information, but in general it affords added flexibility that enables us to relate it to generalization, via the PAC-Bayes bound, and to invariance. For the latter, we introduce a notion of effective information in the activations, which are deterministic functions of future inputs. We relate this to the Information in the Weights, and use this result to show that models of low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs. These relations hinge not only on the architecture of the model, but also on how it is trained.