Exploration by Exploitation: Curriculum Learning for Reinforcement Learning Agents through Competence-Based Curriculum Policy Search
Track: Robotics
Keywords: curriculum learning, reinforcement learning, exploration by exploitation
TL;DR: CURATE trains RL agents to complete difficult target tasks by learning a curriculum that dynamically scales the task difficulty to the current capabilities of the agent.
Abstract: We present CURATE, an algorithm for automatic curriculum learning for reinforcement learning agents to solve a difficult target task distribution with sparse rewards. Initially, due to fundamental exploration challenges without informed priors or specialized algorithms, agents may be unable to consistently receive rewards, leading to inefficient learning. Through "exploration by exploitation," CURATE dynamically scales the task difficulty to match the agent’s current competence. By exploiting its current capabilities that were learned in easier tasks, the agent improves its exploration in more difficult tasks. While training the agent, CURATE conducts policy search in the curriculum space to learn a task distribution for the agent corresponding to the easiest tasks that the agent has not yet solved. As the agent's mastery grows, the learned curriculum adapts correspondingly in an approximately easiest-to-hardest fashion, efficiently culminating in an agent that can solve the target tasks. Our experiments demonstrate that the curricula learned by CURATE achieve greater sample efficiency for solving the target tasks than state-of-the-art algorithms and most baselines. Although a hand-designed curriculum was more performant for one-dimensional curricula, CURATE excels in two-dimensional curricula where the optimal task sequencing is not obvious.
Serve As Reviewer: ~Tabitha_Edith_Lee1, ~Esra'a_Saleh1
Submission Number: 89
Loading