TL;DR: We propose a graph-based offline HRL framework that significantly improves long-horizon reasoning and trajectory stitching.
Abstract: Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.
Lay Summary: To accomplish complex tasks, AI robots benefit from breaking down long-horizon goals into smaller, manageable subgoals and learning how to act at each step. This approach is known as Hierarchical Reinforcement Learning (HRL), where a high-level policy learns to generate subgoals and a low-level policy learns to reach them through appropriate actions.
However, existing HRL methods typically learn within individual trajectories, which limits their ability to generalize when only fragmented or subtask-level trajectories are available. These methods often struggle to stitch together knowledge across different experiences, resulting in poor performance on long-horizon tasks.
To address this challenge, we propose an unsupervised graph construction method that learns potential connectivity between states across different trajectories using a temporal distance representation. The high-level planner builds a path toward long-horizon goals over this graph, while the low-level policy learns to navigate between connected nodes. Our method achieves superior performance compared to existing offline HRL approaches across a variety of tasks, demonstrating its effectiveness.
Link To Code: https://github.com/qortmdgh4141/GAS
Primary Area: Reinforcement Learning->Batch/Offline
Keywords: Offline Hierarchical Reinforcement Learning, Offline Goal-Conditioned Reinforcement Learning, Graph-based Reinforcement Learning, Temporal Distance Representation Learning
Submission Number: 758
Loading