Don't Guess the Future, Find the Bottleneck: Spectral Subgoals for Offline Goal-Conditioned RL

Don't Guess the Future, Find the Bottleneck: Spectral Subgoals for Offline Goal-Conditioned RL

ICLR 2026 Conference Submission16443 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: offline goal conditional reinforcement learning

Abstract: Offline goal-conditioned RL (OGCRL) learns to reach arbitrary goals from offline dataset, but long-horizon performance hinges on crossing a handful of hard-to-cross bottlenecks. These bottlenecks not only dictate the feasible paths toward the goal but also act as critical keypoints, marking the transitions between adjacent regions and providing the agent with essential directional guidance. Prior hierarchical methods pick subgoals by time or short-horizon value heuristics, which do not localize the bottleneck, as a result, the agent losing the clear guidance that bottlenecks could provide about where to pass next. We instead model long-horizon planning as “cross the next bottleneck”: we apply Laplacian spectral clustering to offline dataset to expose bottlenecks and then identify trajectories from the offline dataset that cross these boundaries, and the intersects are defined as keypoints (KPs). Then the most representative KPs are automatically selected and a directed KP reachability graph $\mathcal G_{\mathrm{KP}}$ is constructed based on the selected KPs. We then restrict high-level choices to these bottleneck states and use a pluggable low-level controller to execute the short transitions between them. We provide theory showing that the next bottleneck is the optimal one-step subgoal and that Laplacian spectra recover bottlenecks with high overlap. Thus, Laplacian spectral clustering can discover approximately optimal subgoals. Empirically, the same pattern holds: across D4RL and OGBench, our method achieves state-of-the-art results on a broad set of navigation and manipulation tasks and across diverse dataset regimes, for example, **96.5\%** on **AntMaze** and **84.5\%** on **Franka-Kitchen**.

Primary Area: reinforcement learning

Submission Number: 16443

Loading