Policy Transfer for Improved Sample Efficiency in Goal-Conditioned Reinforcement Learning

Usman Islam; Zhixun Chen; Matteo Leonetti; Stefanos Leonardos; Yali Du

Policy Transfer for Improved Sample Efficiency in Goal-Conditioned Reinforcement Learning

Usman Islam, Zhixun Chen, Matteo Leonetti, Stefanos Leonardos, Yali Du

20 Sept 2025 (modified: 23 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: goal-conditioned, reinforcement learning, hierarchical, sample efficiency, transfer, graph-based

TL;DR: Introducing hierarchical policy transfer to GCRL, leading to substantial gains in sample efficiency.

Abstract: Goal-Conditioned Reinforcement Learning (GCRL) tackles the challenging problem of long-horizon, sparse-reward goal-reaching tasks with continuous actions. Recent methods, relying on a two-level hierarchical policy along with a graph of sub-goal landmarks, have demonstrated reasonable asymptotic performance. However, existing algorithms suffer from poor sample efficiency due to the need to train the low-level policy from scratch, concurrently with the high-level policy, for each given task. We instead claim that transferring a pre-trained low-level policy between environments can dramatically improve sample efficiency and even success rates. We introduce an algorithm PROMO, consisting of a transferable low-level GCRL policy, and a high-level graph-based planner. Our self-terminating landmark generation procedure progressively covers the entire goal space with landmarks based on novelty and reachability. We demonstrate 3-4x improvements in sample efficiency over existing state-of-the-art methods on the challenging robotics tasks of AntMaze and Reacher3D, with the mild overhead of one-time policy pre-training. In addition, our method achieves near 100% success rate in almost all environments, as well as better training stability and much fewer, more informative landmarks.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 23093

Loading