Efficient Exploitation of Hierarchical Structure in Sparse Reward Reinforcement Learning

Published: 22 Jan 2025, Last Modified: 10 Mar 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We study goal-conditioned Hierarchical Reinforcement Learning (HRL), where a high-level agent instructs sub-goals to a low-level agent. Under the assumption of a sparse reward function and known hierarchical decomposition, we propose a new algorithm to learn optimal hierarchical policies. Our algorithm takes a low-level policy as input and is flexible enough to work with a wide range of low-level policies. We show that when the algorithm that computes the low-level policy is optimistic and provably efficient, our HRL algorithm enjoys a regret bound which represents a significant improvement compared to previous results for HRL. Importantly, our regret upper bound highlights key characteristics of the hierarchical decomposition that guarantee that our hierarchical algorithm is more efficient than the best monolithic approach. We support our theoretical findings with experiments that underscore that our method consistently outperforms algorithms that ignore the hierarchical structure.
Submission Number: 767
Loading