Mechanistic Study of Transformer In-Context Learning with Categorical Outputs

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: transformers, in-context learning, mechanistic understanding
TL;DR: Transformer mechanisms are examined through the lens of in-context learning with categorical observations.
Abstract: We study in-context learning (ICL) with Transformers for categorical outputs $y_i$, a setting largely unexplored compared to research on real-valued $y_i$. While attention-only Transformers can, in principle, perform functional gradient descent (GD) inference for real-valued outputs, we show that categorical $y_i$ introduce a non-linearity in GD that attention-only models cannot capture. This reveals a crucial role for the Transformer's multi-layered perceptron (MLP) layers, which we show are generally necessary for categorical ICL. However, we also analyze conditions under which attention-only models can, surprisingly, still perform well. Since training for categorical ICL requires substantial data, we propose a sparse Transformer parametrization linked to functional GD. This model trains far more efficiently with minimal performance degradation compared to an unconstrained Transformer. Our sparse design proves particularly valuable for data-limited applications, which we demonstrate through the ICL analysis of human surgical procedures.
Primary Area: interpretability and explainable AI
Submission Number: 18142
Loading