Meta-GC-TTT: Training Offline Goal-Conditioned Policies for Test-Time Adaptation

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: meta-learning, test-time training, offline RL, GC-TTT
Abstract: In offline goal-conditioned reinforcement learning, Test-Time Training (TTT) can specialize a pre-trained policy to the current state and goal at deployment. This turns a broad goal-conditioned policy into a query-specific expert. Yet, standard offline pre-training optimizes the policy before TTT, not after it. As a result, the policy is not trained for the gradient dynamics it will face at test time. We introduce Meta-GC-TTT, a framework for learning test-time-trainable goal-conditioned policies. Meta-GC-TTT samples state-goal tasks from the offline dataset, adapts the policy with TTT, and updates the base policy for post-TTT performance. Our evaluation on the OGBench loco-navigation suite demonstrates that meta-learned initializations significantly improve zero-shot performance and achieve a (3 − 5×) increase in adaptation efficiency. Overall, offline goal-conditioned policies should not only be trained not only to act, but also to adapt.
Submission Number: 107
Loading