Keywords: goal-conditioned, reinforcement learning, hierarchical, sample efficiency, transfer, graph-based
TL;DR: Introducing hierarchical policy transfer to GCRL, leading to substantial gains in sample efficiency.
Abstract: Goal-Conditioned Reinforcement Learning (GCRL) tackles the challenging problem of long-horizon, sparse-reward goal-reaching tasks with continuous actions. Recent methods, relying on a two-level hierarchical policy along with a graph of sub-goal landmarks, have demonstrated reasonable asymptotic performance. However, existing algorithms suffer from poor sample efficiency due to the need to train the low-level policy from scratch, concurrently with the high-level policy, for each given task. We instead claim that transferring a pre-trained low-level policy between environments can dramatically improve sample efficiency and even success rates. We introduce an algorithm PROMO, consisting of a transferable low-level GCRL policy, and a high-level graph-based planner. Our self-terminating landmark generation procedure progressively covers the entire goal space with landmarks based on novelty and reachability. We demonstrate 3-4x improvements in sample efficiency over existing state-of-the-art methods on the challenging robotics tasks of AntMaze and Reacher3D, with the mild overhead of one-time policy pre-training. In addition, our method achieves near 100% success rate in almost all environments, as well as better training stability and much fewer, more informative landmarks.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 23093
Loading