Mechanistic Study of Transformer In-Context Learning with Categorical Outputs

Soumya Banerjee; Aaron T Wang; William Convertino; Ozanan Meireles; Guy Rosman; Ricardo Henao; Xiang Cheng; Lawrence Carin

Mechanistic Study of Transformer In-Context Learning with Categorical Outputs

Soumya Banerjee, Aaron T Wang, William Convertino, Ozanan Meireles, Guy Rosman, Ricardo Henao, Xiang Cheng, Lawrence Carin

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: transformers, in-context learning, mechanistic understanding

TL;DR: Transformer mechanisms are examined through the lens of in-context learning with categorical observations.

Abstract: We study in-context learning (ICL) with Transformers for categorical outputs $y_i$, a setting largely unexplored compared to research on real-valued $y_i$. While attention-only Transformers can, in principle, perform functional gradient descent (GD) inference for real-valued outputs, we show that categorical $y_i$ introduce a non-linearity in GD that attention-only models cannot capture. This reveals a crucial role for the Transformer's multi-layered perceptron (MLP) layers, which we show are generally necessary for categorical ICL. However, we also analyze conditions under which attention-only models can, surprisingly, still perform well. Since training for categorical ICL requires substantial data, we propose a sparse Transformer parametrization linked to functional GD. This model trains far more efficiently with minimal performance degradation compared to an unconstrained Transformer. Our sparse design proves particularly valuable for data-limited applications, which we demonstrate through the ICL analysis of human surgical procedures.

Primary Area: interpretability and explainable AI

Submission Number: 18142

Loading