On provable length and compositional generalization

ICLR 2024 Workshop ME-FoMo Submission23 Authors

Published: 04 Mar 2024, Last Modified: 30 Apr 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Length Generalization, Compositional Generalization, Transformers, State-Space Models
Abstract: Length generalization -- the ability to generalize to longer sequences than ones seen during training, and compositional generalization -- the ability to generalize to token combinations not seen during training, are crucial forms of out-of-distribution generalization in sequence-to-sequence models. In this work, we take first steps towards provable length and compositional generalization for a range of architectures, including deep sets, transformers, state space models, and simple recurrent neural nets. Depending on the architecture, we prove different degrees of representation identification, e.g., a linear or a permutation relation with ground truth representation, is necessary for length and compositional generalization.
Submission Number: 23
Loading