Track: regular paper (up to 6 pages)
Keywords: In-context learning, transformers, simplicity bias, multi-task
TL;DR: We train transformers on tasks of varying complexity and examine their in-context learning capabilities.
Abstract: In-context learning (ICL) is the remarkable ability of trained transformers to adapt to new tasks by leveraging a sequence of examples provided at inference time—without any additional training. Prior work on understanding ICL has primarily focused on setups with fixed task complexity (e.g., linear, logistic, or sinusoidal regression tasks with fixed complexity, and more recently first-order Markov chains), overlooking the diverse range of tasks that large language models encounter in practice. In this paper, we investigate ICL in transformers trained on multiple task categories of varying complexity. Our results show that, during inference, transformers effectively learn in-context by identifying the appropriate task complexity and accurately estimating the corresponding task parameters. We verify our claim with experiments on Markov chains and linear regression tasks of varying complexity. Additionally, our experiments suggest that transformers exhibit a bias towards learning the simplest task that explains the inference-time context.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Presenter: ~Puneesh_Deora1
Submission Number: 54
Loading