Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformer, Universal Approximation, In-context Learning, Vocabulary
TL;DR: Answer the question: Does the universal approximation property still hold if the context in in-context learning is restricted to a finite set?
Abstract: Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve as a control parameter for the model, endowing it with the universal approximation property (UAP). In practice, context is represented by tokens from a finite set, referred to as a vocabulary, which is the case considered in this paper, i.e., vocabulary in-context learning (VICL). We demonstrate that VICL in single-layer Transformers, without positional encoding, does not possess the UAP; however, it is possible to achieve the UAP when positional encoding is included. Several sufficient conditions for the positional encoding are provided. Our findings reveal the benefits of positional encoding from an approximation theory perspective in the context of in-context learning.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 9346
Loading