Keywords: Simplicity, Representation learning, Generalization, Occam's razor, Double descent, Prediction
TL;DR: Representations that are more complex than the world they represent generalize better than simpler representations.
Abstract: Representations enable cognitive systems to generalize from known experiences to the new ones. Simplicity of a representation has been linked to its generalization ability. Conventionally, simple representations are associated with a capacity to capture the structure in the data and rule out the noise. Representations with more flexibility than required to accommodate the structure of the target phenomenon, on the contrary, risk to catastrophically overfit the observed samples and fail to generalize to new observations. Here, I computationally test this idea by using a simple task of learning a representation to predict unseen features based on the observed ones. I simulate the process of learning a representation that has a lower, matching, or higher dimensionality than the world it intends to capture. The results suggest that the representations of the highest dimensionality consistently generate the best out-of-sample predictions despite perfectly memorizing the training observations. These findings are in line with the recently described ``double descent” of generalization error -- an observation that many learning systems generalize best when overparameterized (when their representational capacity far exceeds the task requirements).
In-person Presentation: yes