Sample Complexity of Goal-Conditioned Hierarchical Reinforcement LearningDownload PDF

Published: 20 Jul 2023, Last Modified: 31 Aug 2023EWRL16Readers: Everyone
Keywords: Hierarchical Reinforcement Learning, Sample Complexity
Abstract: Hierarchical Reinforcement Learning (HRL) algorithms can perform planning at multiple levels of abstraction. Empirical results have shown that state or temporal abstractions might significantly improve the sample efficiency of algorithms. Yet, our current understanding does not fully capture the basis of those efficiency gains; nor any theoretically-grounded design rules. In this paper, we derive a lower bound on the sample complexity for the proposed class of goal-conditioned HRL algorithms (e.g. Dot-2-Dot) that leads us to a novel Q-learning algorithm and establishes the relationship between the sample complexity and the nature of the decomposition. Specifically, the proposed lower bound on the sample complexity of such HRL algorithms allows us to quantify the benefits of hierarchical decomposition. We build upon this to formulate a simple Q-learning-type algorithm that leverages goal-hierarchical decomposition. We empirically validate our theoretical findings by investigating the sample complexity of the proposed hierarchical algorithm on a spectrum of tasks. The specific task design allows us to dial up or down their complexity over multiple orders of magnitude. Our theory and algorithmic findings provide a step towards answering the foundational question of quantifying the benefits that hierarchical decomposition provides over monolithic solutions in reinforcement learning.
1 Reply

Loading