Towards Unpredictable Worlds: Continual In-Context Reinforcement Learning in Non-Stationary Environments

Towards Unpredictable Worlds: Continual In-Context Reinforcement Learning in Non-Stationary Environments

ICLR 2026 Conference Submission16523 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-Context Reinforcement Learning, Sequence Model, Non-Stationary

Abstract: Traditional In-Context Reinforcement Learning (ICRL) demonstrates impressive rapid adaptation, but its reliance on static environments limits its applicability. In contrast, real-world scenarios are inherently non-stationary, with continuous and unpredictable changes that challenge an agent's ability to adapt. To bridge this gap, we formally define and systematically investigate Continual In-Context Reinforcement Learning in Non-Stationary Environments. Our central question is: what model architectures and training strategies enable an agent not only to rapidly master new dynamics in a continuously evolving environment, but also to efficiently discard or isolate outdated information, thereby achieving robust online adaptation? To ground our investigation, we construct a new benchmark suite featuring two complementary non-stationary domains---a symbolic reasoning task and a physics-based control task---each modified to exhibit unpredictable, intra-lifetime dynamic changes. On these benchmarks, we conduct extensive evaluations at both the model and training-strategy levels. At the model level, we compare state-of-the-art sequence model architectures. At the training strategy level, we systematically analyze the influence of stationary versus non-stationary training, dynamic change frequency, context length, and interaction scale. Our findings demonstrate the necessity of non-stationary training and reveal critical factors shaping continual adaptation. These results provide actionable insights and design principles for building agents capable of learning and adapting in truly open and dynamic worlds.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 16523

Loading