Towards a Theoretical Understanding of In-context Learning: Stability and Non-I.I.D Generalisation

Yingjie Wang; Yutian Zhou; Shi Fu; Yuzhu Chen; Yongcheng Jing; Leszek Rutkowski; Dacheng Tao

Towards a Theoretical Understanding of In-context Learning: Stability and Non-I.I.D Generalisation

Yingjie Wang, Yutian Zhou, Shi Fu, Yuzhu Chen, Yongcheng Jing, Leszek Rutkowski, Dacheng Tao

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-context Learning, Generalisation Error

TL;DR: This paper establishes generalisation bounds for Transformer-based models in in-context learning under non-i.i.d. scenarios.

Abstract: In-context learning (ICL) has demonstrated significant performance improvements in transformer-based large models. This study identifies two key factors influencing ICL generalisation under complex non-i.i.d. scenario: algorithmic stability and distributional discrepancy. First, we establish a stability bound for transformer-based models trained with mini-batch gradient descent, revealing how specific optimization configurations interact with the smoothness of the loss landscape to ensure the stability of non-linear Transformers. Next, we introduce a distribution-level discrepancy measure that highlights the importance of aligning the ICL prompt distribution with the training data distribution to achieve effective generalisation. Building on these insights, we derive a generalisation error bound for ICL with asymptotic convergence guarantees, which further reveals that token-wise prediction errors accumulate over time and even lead to generalisation collapse if the prediction length is not properly constrained. Finally, empirical evaluations are provided to validate our theoretical findings.

Primary Area: learning theory

Submission Number: 15662

Loading