Keywords: In-context learning, compositionality, interpretability
TL;DR: We study different strategies transformer learns to solve a compositional modular exponential task when subtasks are given in-context.
Abstract: In-context learning (ICL) research often considers learning a single class of function in-context through a uniform sample of input-output pairs. However, natural language data often has more complex structural correlations, such as the composition of information in a given context. Here, we study such compositional structure in context with a toy modular-arithmetic task and investigate how the in-context curriculum of constituent function examples may alter the computations a transformer learns to solve compositional tasks. We compare models trained with varying in-context curricula of subtasks and the composite task examples. We show that models trained with subtasks in-context generalize to unseen compositional tasks by building an inner representation of the intermediate computation of subtasks. Finally, we find that the model often exhibits a continuous spectrum of a compositional strategy, rather than discrete modes, which are modulated by curriculum design.
Primary Area: interpretability and explainable AI
Submission Number: 21138
Loading