A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level LearningDownload PDF

Published: 20 Jul 2023, Last Modified: 29 Aug 2023EWRL16Readers: Everyone
Keywords: Hierarchical Reinforcement Learning, Regret Guarantees, Options
TL;DR: A theoretical analysis to investigate when a hierarchical approach should be preferred to a standard one
Abstract: Hierarchical Reinforcement Learning (HRL) approaches have shown successful results in solving a large variety of complex, structured, long-horizon problems. Nevertheless, a full theoretical understanding of this empirical evidence is currently missing. In the context of the *option* framework, previous works have conceived provably efficient algorithms for the case in which the options are *fixed* and the high-level policy selecting among options only has to be learned. However, the fully realistic scenario in which *both* the high-level and the low-level policies are learned is surprisingly disregarded from a theoretical perspective. This work makes a step towards the understanding of this latter scenario. Focusing on the finite-horizon problem, in this paper, we propose a novel meta-algorithm that alternates between two regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we look at the problem as a Semi-Markov Decision Process (SMDP), keeping the low-level policies fixed, while at a lower level, we learn the inner option policies by keeping the high-level policy fixed. Then, we specialize the results for a specific choice of algorithms, where we propose a novel provably efficient algorithm for the finite-horizon SMDPs, and we use a state-of-the-art regret minimizer for the options learning. We compare the bounds derived with those of state-of-the-art regret minimization algorithms for non-hierarchical finite-horizon problems. The comparison allows us to characterize the class of problems in which a hierarchical approach is provably preferable, even when a set of pre-trained options is not given.
1 Reply

Loading