Stochastic Subgoal Representation for Hierarchical Reinforcement Learning

Vivienne Huiling Wang; Tinghuai Wang; Wenyan Yang; Joni-kristian Kamarainen; Joni Pajarinen

Stochastic Subgoal Representation for Hierarchical Reinforcement Learning

Vivienne Huiling Wang, Tinghuai Wang, Wenyan Yang, Joni-kristian Kamarainen, Joni Pajarinen

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: reinforcement learning, hierarchical reinforcement learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: This paper introduces a Gaussian processes based Bayesian approach to learn stochastic subgoal representations for HRL.

Abstract: Goal-conditioned hierarchical reinforcement learning (HRL) promises to make long-term decision-making feasible by reducing the effective planning horizon through a latent subgoal space for high-level policies. However, existing methods employ deterministic subgoal representations, which may hinder the stability and efficiency of hierarchical policy learning. This paper introduces a Gaussian process (GP) based Bayesian approach to learn stochastic subgoal representations. Our method learns a posterior distribution over the latent subgoal space, utilizing GPs to account for the stochastic uncertainties in the learned representation, thus facilitating improved exploration. Moreover, our approach offers an adaptive memory that integrates long-range subgoal information from prior planning steps. This enhances representation in novel state regions and bolsters robustness against environmental stochasticity. In experiments, our approach surpasses state-of-the-art HRL methods in both deterministic and stochastic settings with dense and sparse external rewards. Additionally, we demonstrate that our approach allows transfer of low-level policies across tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5305

Loading