In-context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-separation

Frank Cole; Yuxuan Zhao; Yulong Lu; Tianhao Zhang

In-context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-separation

Frank Cole, Yuxuan Zhao, Yulong Lu, Tianhao Zhang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-context learning, linear dynamical systems, depth-separation

TL;DR: We show that shallow linear transformers fail to in-context learn linear dynamical systems, uncovering a distinction between in-context learning over iid and non-iid data; in contrast, transformers with log-depth successfully learn dynamical systems.

Abstract: This paper investigates approximation-theoretic aspects of the in-context learning capability of the transformers in representing a family of noisy linear dynamical systems. Our first theoretical result establishes an upper bound on the approximation error of multi-layer transformers with respect to an $L^2$-testing loss uniformly defined across tasks. This result demonstrates that transformers with logarithmic depth can achieve error bounds comparable with those of the least-squares estimator. In contrast, our second result establishes a non-diminishing lower bound on the approximation error for a class of single-layer linear transformers, which suggests a depth-separation phenomenon for transformers in the in-context learning of dynamical systems. Moreover, this second result uncovers a critical distinction in the approximation power of single-layer linear transformers when learning from IID versus non-IID data.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 13270

Loading