Decoupling Planning from Control: Stable Hierarchical RL with a Learned Metric Space

07 Jan 2026 (modified: 16 Apr 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Hierarchical Reinforcement Learning (HRL) offers a promising framework for solving complex, long-horizon tasks by decomposing them into manageable subproblems. However, conventional HRL methods suffer from a critical non-stationarity problem: the high-level planner's learning process is destabilized because the low-level policy is concurrently learning and constantly changing. This issue is particularly severe in resource-constrained systems, such as edge-cloud robotics, where the low-level controller must be a computationally simple, low-capacity model. To address this challenge, we propose a novel HRL framework that mitigates the non-stationarity issue by decoupling high-level planning from low-level control. The core of our approach is to reframe the planner's task: instead of learning the planner via RL on non-stationary transitions, it learns to navigate a learned "map" of the environment. This map is represented by a critic network trained to function as a metric space, where distances reflect approximate travel costs. Planning is then simplified to finding optimal subgoals that lie along the shortest path (geodesic) between the current state and the final goal. To further encourage geometric consistency in the learned map, we introduce a trajectory regularization loss based on the agent's experienced trajectories. Experiments demonstrate that our decoupled framework is highly robust. In scenarios with resource-constrained low-level policies, our method learns to solve complex tasks effectively where standard approaches fail. This result highlights our framework's suitability for real-world systems where low-level controllers have inherently limited computational capacity.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Erin_J_Talvitie1
Submission Number: 6889
Loading