Abstract: Hierarchical Reinforcement Learning (HRL) offers a promising framework for solving complex, long-horizon tasks by decomposing them into manageable subproblems. However, conventional HRL methods suffer from a critical non-stationarity problem: the high-level planner's learning process is destabilized because the low-level policy is concurrently learning and constantly changing. This issue is particularly severe in resource-constrained systems, such as edge-cloud robotics, where the low-level controller must be a computationally simple, low-capacity model.
To address this challenge, we propose a novel HRL framework that resolves the non-stationarity issue by decoupling high-level planning from low-level control. The core of our approach is to reframe the planner's task: instead of learning the planner via RL on non-stationary transitions, it learns to navigate a stable "map" of the environment. This map is represented by a critic network trained to function as a metric space, where distances reflect optimal travel costs. Planning is then simplified to finding optimal subgoals that lie along the shortest path (geodesic) between the current state and the final goal. To further improve the accuracy of this map, we introduce a novel trajectory regularization loss that enforces geometric consistency along the agent's experienced trajectories.
Experiments demonstrate that our decoupled framework is highly robust. In scenarios with resource-constrained low-level policies, our method learns to solve complex tasks effectively where standard approaches fail. This result highlights our framework's suitability for real-world systems where low-level controllers have inherently limited computational capacity.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Erin_J_Talvitie1
Submission Number: 6889
Loading