Track: regular paper (up to 6 pages)
Keywords: In-context learning, transformers, simplicity bias, linear regression, markov chains, multi-task
TL;DR: We train transformers on tasks of varying complexity and examine their in-context learning capabilities.
Abstract: In-context learning (ICL) is the remarkable ability of trained transformers to adapt to new tasks by leveraging a sequence of examples provided at inference time—without any additional training. Prior work on understanding ICL has primarily focused on setups with fixed task complexity (e.g., linear, logistic, or sinusoidal regression tasks with fixed complexity, and more recently first-order Markov chains), overlooking the diverse range of tasks that large language models encounter in practice. In this paper, we investigate ICL in transformers trained on multiple task categories of varying complexity. Our results show that, during inference, transformers effectively learn in-context by identifying the appropriate task complexity and accurately estimating the corresponding task parameters. We verify our claim with experiments on Markov chains and linear regression tasks of varying complexity. Additionally, our experiments suggest that transformers exhibit a bias towards learning the simplest task that explains the inference-time context.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 54
Loading