A tale of two goals: leveraging short term goals performs best in multi-goal scenarios

TMLR Paper5924 Authors

18 Sept 2025 (modified: 26 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: When an agent must learn to reach far away goals, several hierarchical reinforcement learning methods leverage planning to create a sequence of intermediate goals guiding a lower-level goal-conditioned policy. The low-level policy is typically conditioned on the current goal, with the aim of reaching it as quickly as possible. However, this approach can fail when intermediate goals can be reached in multiple ways, some of which may prevent continuing toward subsequent goals. To address this issue, we introduce an enriched Markov Decision Process (MDP) framework where the optimization objective not only considers reaching the current goal, but also subsequent ones. Using this framework, we can specify which goals the agent prepares to achieve ahead of time. To study the impact of this design, we conduct a series of experiments on navigation, balancing and locomotion tasks in which sequences of intermediate goals are given. By evaluating policies trained with an off-policy actor-critic algorithm on both the standard goal-conditioned MDP framework and ours, we show that, in most cases, preparing to reach the next two goals improves stability and sample efficiency over all other approaches.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Martha_White1
Submission Number: 5924
Loading