Distinct Computations Emerge From Compositional Curricula in In-Context Learning

Published: 06 Mar 2025, Last Modified: 06 Mar 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: regular paper (up to 6 pages)
Keywords: in-context learning, compositional generalization, curriculum learning
TL;DR: Subtask curriculum in in-context help robust compositional generalization by encoding subtask information
Abstract: In-context learning (ICL) typically presents a function through a uniform sample of input-output pairs. Here, we investigate how presenting a compositional subtask curriculum in context may alter the computations that the model learns. We design a compositional algorithmic task based on the modular exponential---a double exponential task composed of two single exponential subtasks---and train transformer models to learn the task in-context. We compare the model when trained (a) using an in-context curriculum consisting of single exponential subtasks and, (b) the model trained directly on the double exponential task without such a curriculum. We show that the model trained with a subtask curriculum can perform zero-shot inference on unseen compositional tasks and is more robust given the same context length. We study how the task is represented across the two training regimes, in particular whether subtask information is represented. We find that the model employs different mechanisms, possibly changing through training, in a way modulated by the data properties of the in-context curriculum.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Presenter: ~Jin_Hwa_Lee1
Submission Number: 17
Loading