Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

ICLR 2026 Conference Submission18077 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: goal-conditioned reinforcement learning, hierarchical reinforcement learning, sparse reward, long-horizon tasks, graph-based policy learning, subgoal planning
TL;DR: We propose a new hierarchical RL framework (SSE) that, instead of relabeling subgoal failures as successes, treats them as terminal failures with zero reward, dramatically improving the high-level planner's reliability and efficiency.
Abstract: Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, their reliance on conventional hindsight relabeling often fails to correct subgoal infeasibility, leading to inefficient high-level planning. To address this, we propose Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that integrates Frontier Experience Replay (FER) to separate unreachable from admissible subgoals and streamline high-level decision making. FER delineates the reachability frontier using failure and partial-success transitions, which identifies unreliable subgoals, increases subgoal reliability, and reduces unnecessary high-level decisions. Additionally, SSE employs a decoupled exploration policy to cover underexplored regions of the goal space and a path refinement that adjusts edge costs using observed low-level failures. Experimental results across diverse long-horizon benchmarks show that SSE consistently outperforms existing goal-conditioned and hierarchical RL methods in both efficiency and success rate.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 18077
Loading