Induction Head Implementation Across Diverse Transformer Weight Constructions

03 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Induction head
Abstract: Induction heads are a class of self-attention mechanisms empirically crucial for in-context learning in transformer models. Although previous research has suggested possible forms of induction heads, it remains unclear how they interact with other network modules and operate on their outputs. In this work, we address this question by showing that a two-layer induction head allows flexibility in the construction of its first layer. This flexibility enables the induction head to operate alongside other modules within the network. Additionally, in the multi-layer networks where it's difficult for the induction heads to retrieve the original input, we propose a new mechanism akin to induction heads (in the sense of using the information of inter-token identity) that still functions in deep networks. We also performed proof-of-concept experiments showing that induction heads are trainable using only a subset of the model’s layers. Our key insight is that information about which tokens are identical are possible to be extracted from the outputs of many transformer networks, which is essential for applying the induction head mechanism. Our work presents the diversity in the realization of induction heads, which serves as an explanation for why induction heads consistently appear across models.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 1253
Loading