Keywords: reinforcement learning, curriculum learning, goal-conditioned reinforcement learning
Abstract: Goal-conditioned reinforcement learning (RL) tackles the problem of training an RL agent to reach multiple goals in an environment, often with sparse rewards only administered upon reaching the goal.
In this regard, automatic curriculum learning can improve an agent's learning by sampling goals in a structured order catered to the agent's current ability.
This work presents two contributions to improve learning in goal-conditioned RL environments.
First, we present a simple, algorithm-agnostic technique to accelerate learning by continuous goal sampling, in which an agent's goals are sampled and changed multiple times within a single episode.
Such continuous goal sampling enables faster exploration of the goal space and allows curriculum methods to have a more significant impact on an agent's learning.
Second, we propose VDIFF, an automatic curriculum learning method that uses an agent's value function to create a self-paced curriculum by sampling goals on which the agent is demonstrating high learning progress.
Through results on 17 multi-goal robotic environments and navigation tasks, we show that continuous goal sampling and VDIFF work synergistically and result in performance gains over current state-of-the-art methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
TL;DR: We present continuous goal sampling, an extension of goal-conditioned RL that accelerates a wide range of curriculum learning algorithms.
15 Replies
Loading