Provable Strategic In-Context Learning of Transformers

18 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: In-context Learning, Reinforcement Learning, Transformers
Abstract: In-context reinforcement learning (ICRL) enables Transformers to adapt to new decision-making tasks using context alone without parameter updates. Recent works demonstrate that Transformers can replicate reinforcement learning strategies, such as V-learning, in uncoupled learning environments. This provides valuable insights and lays the theoretical foundation for applying Transformers to strategic settings. However, their approach relies on external sampling mechanisms during inference, introducing an artificial layer on top of the original Transformer structure. This work investigates whether Transformers can perform in-context game playing in matrix and Markov zero-sum games entirely within their architecture, without external procedures. We show the first theoretical result that Transformers can approximate online mirror descent (OMD) dynamics in both repeated and sequential multi-agent games, with the accuracy dependent on model size. Additionally, we demonstrate the benefit of pre-training with longer trajectories under appropriate model architecture choices.
Supplementary Material: pdf
Primary Area: reinforcement learning
Submission Number: 11836
Loading