Keywords: Discrete planning, Hierarchical Reinforcement Learning, Reinforcement Learning
TL;DR: DHP is a novel planning method that uses discrete reachability checks instead of distance metrics. It encourages short plans and generalizes beyond training depths. A novel memory-based explorer helps collect training data for efficient learning.
Abstract: Hierarchical Reinforcement Learning (HRL) agents often fail in long-horizon visual planning because they rely on error-prone distance metrics to choose subgoals. We introduce Discrete Hierarchical Planning (DHP), which evaluates subgoal feasibility using reachability checks instead of continuous distance estimates. DHP builds tree-structured plans that decompose goals into simpler subtasks and employs a
-return update that naturally favors shallow decompositions and generalizes beyond training depth. To improve data efficiency, we add an intrinsic exploration policy that automatically generates informative trajectories for training the planner. In a 25-room navigation benchmark, DHP achieves 100% success (vs. 90%) and shorter episode lengths. The method also extends to momentum-based control tasks and requires only O(log N) replanning steps.
Submission Number: 80
Loading