The impact of task structure, representational geometry, and learning mechanism on compositional generalization

Published: 02 Mar 2024, Last Modified: 10 May 2024ICLR 2024 Workshop Re-Align PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 9 pages)
Keywords: Compositionality, out-of-distribution generalization, rich regime, kernel models, modularity, additivity, conjunctions
TL;DR: We present a theory of compositional generalization in kernel models and show how rich networks can overcome their limitations.
Abstract: Compositional generalization (the ability to respond correctly to novel arrangements of familiar components) is thought to be a cornerstone of intelligent behavior. However, a theory of how and why models generalize compositionally across diverse tasks remains lacking. To make progress on this topic, we consider compositional generalization for kernel models with fixed, potentially nonlinear representations and a trained linear readout. We prove that they are limited to conjunction-wise additive compositional computations, and identify compositionality failure modes that arise from the data distribution and the model structure. For models in the representation learning (or "rich") regime, we show that networks *can* generalize on an important non-additive task (transitive equivalence) and give a mechanistic account for why. Finally, we validate our theory empirically, showing that it captures the behavior of a convolutional network trained on a set of compositional tasks. Taken together, our theory characterizes the principles giving rise to compositional generalization in models with fixed representations, shows how representation learning can overcome their limitations, and provides a taxonomy of compositional tasks that may be useful beyond the models considered here.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 75