Dissecting In-Context Learning: A Mechanistic Analysis of Emergent Circuits in Small Language Models

Eva Paunova

Dissecting In-Context Learning: A Mechanistic Analysis of Emergent Circuits in Small Language Models

Eva Paunova

07 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: in-context learning, mechanistic interpretability, language models, circuits, transformers

TL;DR: We dissect how small transformer models implement in-context learning by identifying four causal circuit types and showing their consistency across scales.

Abstract: In-context learning (ICL) enables language models to adapt to new tasks from just a few examples, yet the mechanistic basis of this capability remains poorly understood. We present a comprehensive analysis of the circuits underlying ICL in transformer models ranging from 125M to 1.3B parameters. Through systematic interventions and causal analysis, we identify four distinct circuit types that emerge during training: copy circuits that replicate patterns, induction circuits that abstract rules, composition circuits that combine information, and task recognition circuits that identify problem types. We demonstrate that these circuits are (1) causally responsible for ICL performance through targeted ablations showing 73% average performance degradation, (2) transferable across model scales with 0.82 correlation in circuit structure, and (3) surgically enhanceable, achieving 28% improvement on targeted tasks. Our analysis reveals that ICL emerges through the coordinated interaction of 12–15 critical attention heads forming interpretable computational graphs. We provide an open-source toolkit for ICL circuit analysis and demonstrate applications to model debugging and capability enhancement. These findings offer actionable insights for improving model interpretability and engineering more capable systems.

Primary Area: interpretability and explainable AI

Submission Number: 2843

Loading