On the Emergence of Induction Heads for In-Context Learning

18 Sept 2025 (modified: 04 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: in-context learning, mechanistic interpretability, transformers, induction heads, learning dynamics, abrupt learning
TL;DR: We prove that induction heads are learned in N^2 time, where N is a transformer's context length
Abstract: Transformers have become the dominant architecture for natural language processing. Part of their success is owed to a remarkable capability known as _in-context learning_ (ICL): they can acquire and apply novel associations solely from their input context, without any updates to their weights. In this work, we study the emergence of _induction heads_, a previously identified mechanism in two-layer transformers that is particularly important for in-context learning. We uncover a relatively simple and interpretable structure of the weight matrices implementing the induction head. We theoretically explain the origin of this structure using a minimal ICL task formulation and a modified transformer architecture. We give a formal proof that the training dynamics remain constrained to a 19-dimensional subspace of the parameter space. Empirically, we validate this constraint while observing that only 3 dimensions account for the emergence of an induction head. By further studying the training dynamics inside this 3-dimensional subspace, we find that the time until the emergence of an induction head follows a tight asymptotic bound that is quadratic in the input context length.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 11101
Loading