CURATE: Automatic Curriculum Learning for Reinforcement Learning Agents through Competence-Based Curriculum Policy Search
Keywords: curriculum learning, reinforcement learning
TL;DR: CURATE trains RL agents to complete difficult target tasks by learning a curriculum that dynamically scales the task difficulty to the current capabilities of the agent.
Abstract: Due to fundamental exploration challenges without informed priors or specialized algorithms, agents may be unable to consistently receive informative rewards, leading to inefficient learning. To address these challenges, we introduce CURATE, an automatic curriculum learning algorithm for reinforcement learning agents designed for difficult target task distributions. Through "exploration by exploitation," CURATE dynamically scales the task difficulty to match the agent's current competence. By exploiting its current capabilities that were learned in easier tasks, the agent improves its exploration in more difficult tasks. Our key insight is that the performance increase in tasks that are close to those used for training is inversely proportional to their difficulty, and an agent that chooses a nearby distribution of the easiest unsolved tasks at any given time can automatically induce an easiest-to-hardest curriculum. To achieve this, CURATE conducts policy search in the task space to learn the best task distribution for training the agent. As the agent's mastery grows, the learned curriculum adapts in an approximately easiest-to-hardest and task-directed fashion, efficiently culminating in an agent that can solve the target tasks. Our experiments across three domains of varying task parameterization and dimensionality demonstrate that CURATE learns highly effective curricula, matching or exceeding prior curriculum methods in target task performance. Moreover, CURATE curricula are effective beyond solving the difficult target tasks, yielding broadly capable agents.
Primary Area: reinforcement learning
Submission Number: 15449
Loading