Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

ICLR 2026 Conference Submission14612 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-Context Learning, Transformer Approximation Theory, Kernel Regression on Manifold

Abstract: While in-context learning (ICL) has achieved remarkable success in natural language and vision domains, its theoretical understanding—particularly in the context of structured geometric data—remains unexplored. This paper initiates a theoretical study of ICL for regression of H\"older functions on manifolds. We establish a novel connection between the attention mechanism and classical kernel methods, demonstrating that transformers effectively perform kernel-based prediction at a new query through its interaction with the prompt. This connection is validated by numerical experiments, revealing that the learned query–prompt scores for H\"older functions are highly correlated with the Gaussian kernel. Building on this insight, we derive generalization error bounds in terms of the prompt length and the number of training tasks. When a sufficient number of training tasks are observed, transformers give rise to the minimax regression rate of H\"older functions on manifolds, which scales exponentially with the intrinsic dimension of the manifold, rather than the ambient space dimension. Our result also characterizes how the generalization error scales with the number of training tasks, shedding light on the complexity of transformers as in-context kernel algorithm learners. Our findings provide foundational insights into the role of geometry in ICL and novels tools to study ICL of nonlinear models.

Supplementary Material: zip

Primary Area: learning on graphs and other geometries & topologies

Submission Number: 14612

Loading