Keywords: Goal-conditioned reinforcement learning, Offline reinforcement learning, Test-time planning, Graph search, Reinforcement Learning, Planning
TL;DR: Test-Time Graph Search (TTGS) uses value-derived distances to plan subgoals over dataset states, boosting long-horizon GCRL without additional training.
Abstract: Offline goal-conditioned reinforcement learning (GCRL) trains policies that reach user-specified goals at test time, providing a simple, unsupervised, domain-agnostic way to extract diverse behaviors from unlabeled, reward-free datasets. Nonetheless, long-horizon decision making remains difficult for GCRL agents due to temporal credit assignment and error accumulation, and the offline setting amplifies these effects. To alleviate this issue, we introduce Test-Time Graph Search (TTGS), a lightweight planning wrapper for pretrained GCRL policies which only uses the pretraining dataset. TTGS accepts any state-space distance or cost signal, builds a weighted graph over dataset states, and performs fast search to assemble a sequence of subgoals that a frozen policy executes. When the base learner is value-based, the distance is derived directly from the learned goal-conditioned value function, so no handcrafted metric is needed. TTGS requires no changes to training, no additional supervision, no online interaction, and no privileged information, and it runs entirely at inference. On the OGBench benchmark, TTGS improves success rates of multiple base learners on challenging locomotion tasks, demonstrating the benefit of simple metric-guided test-time planning for offline GCRL.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 14079
Loading