Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics

Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics

ICLR 2026 Conference Submission20632 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: zero-shot reinforcement learning, unsupervised reinforcement learning, successor measure

TL;DR: We provide both theoretical and empirical evidence that Forward–Backward representations cannot adapt to changing dynamics and introduce a method that overcomes this, generalizing to both seen and unseen dynamics at test time.

Abstract: Behavioral Foundation Models (BFMs) proved successful in producing near-optimal policies for arbitrary tasks in a zero-shot manner, requiring no test-time retraining or task-specific fine-tuning. Among the most promising BFMs are the ones that estimate the successor measure learned in an unsupervised way from task-agnostic offline data. However, these methods fail to react to changes in the dynamics, making them inefficient under partial observability or when the transition function changes. This hinders the applicability of BFMs in a real-world setting, e.g., in robotics, where the dynamics can unexpectedly change at test time. In this work, we demonstrate that Forward–Backward (FB) representation, one of the methods from the BFM family, cannot produce reasonable policies under distinct dynamics, leading to an interference among the latent policy representations. To address this, we propose an FB model with a transformer-based belief estimator, which greatly facilitates zero-shot adaptation. Additionally, we show that partitioning the policy encoding space into dynamics-specific clusters, aligned with the context-embedding directions, yields additional gain in performance. Those traits allow our method to respond to the dynamics mismatches observed during training and to generalize to unseen ones. Empirically, in the changing dynamics setting, our approach achieves up to a 2x higher zero-shot returns compared to the baselines for both discrete and continuous tasks.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 20632

Loading