In-Context Algorithm Emulation in Fixed-Weight Transformers

ICLR 2026 Conference Submission558 Authors

01 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: In-Context Learning, Attention Mechanism, In-Context Gradient Descent, Transformer, Universal Approximation
TL;DR: We prove that a minimal Transformer with frozen weights emulates a broad class of algorithms by in-context prompting
Abstract: We prove that a minimal Transformer with frozen weights emulates a broad class of algorithms by in-context prompting. We formalize two modes of in-context algorithm emulation. In the *task-specific mode*, for any continuous function $f: \mathbb{R} \to \mathbb{R}$, we show the existence of a single-head softmax attention layer whose forward pass reproduces reproduces functions of the form $f(w^\top x - y)$ to arbitrary precision. This general template subsumes many popular machine learning algorithms (e.g., gradient descent, linear regression, ridge regression). In the *prompt-programmable mode*, we prove universality: a single fixed-weight two-layer softmax attention module emulates all algorithms from the task-specific class (i.e., each implementable by a single softmax attention) via only prompting. Our key idea is to construct prompts that encode an algorithm’s parameters into token representations, creating sharp dot-product gaps that force the softmax attention to follow the intended computation. This construction requires no feed-forward layers and no parameter updates. All adaptation happens through the prompt alone. Numerical results corroborate our theory. These findings forge a direct link between in-context learning and algorithmic emulation, and offer a simple mechanism for large Transformers to serve as prompt-programmable interpreters of algorithms. They illuminate how GPT-style foundation models may swap algorithms via prompts alone, and establish a form of algorithmic universality in modern Transformer models.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 558
Loading