Exploration by Exploitation: Curriculum Learning for Reinforcement Learning Agents through Competence-Based Curriculum Policy Search

Tabitha Edith Lee; Nan Rosemary Ke; Sarvesh Patil; Annya Dahmani; Eunice Yiu; Esra'a Saleh; Alison Gopnik; Oliver Kroemer; Glen Berseth

Exploration by Exploitation: Curriculum Learning for Reinforcement Learning Agents through Competence-Based Curriculum Policy Search

Tabitha Edith Lee, Nan Rosemary Ke, Sarvesh Patil, Annya Dahmani, Eunice Yiu, Esra'a Saleh, Alison Gopnik, Oliver Kroemer, Glen Berseth

Published: 12 Jun 2025, Last Modified: 10 Jul 2025EXAIT@ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Robotics

Keywords: curriculum learning, reinforcement learning, exploration by exploitation

TL;DR: CURATE trains RL agents to complete difficult target tasks by learning a curriculum that dynamically scales the task difficulty to the current capabilities of the agent.

Abstract: We present CURATE, an algorithm for automatic curriculum learning for reinforcement learning agents to solve a difficult target task distribution with sparse rewards. Initially, due to fundamental exploration challenges without informed priors or specialized algorithms, agents may be unable to consistently receive rewards, leading to inefficient learning. Through "exploration by exploitation," CURATE dynamically scales the task difficulty to match the agent's current competence. By exploiting its current capabilities that were learned in easier tasks, the agent improves its exploration in more difficult tasks. While training the agent, CURATE conducts policy search in the curriculum space to learn a task distribution for the agent corresponding to the easiest unsolved tasks. As the agent's mastery grows, the learned curriculum adapts correspondingly in an approximately easiest-to-hardest fashion, efficiently culminating in an agent that can solve the target tasks. Our experiments demonstrate that the curricula learned by CURATE achieve greater sample efficiency for solving the target tasks than state-of-the-art algorithms and most baselines. Although an incremental, easiest-to-hardest curriculum was more performant for one-dimensional curricula, CURATE shows promising performance for two-dimensional curricula where the optimal task sequencing is not obvious.

Serve As Reviewer: ~Tabitha_Edith_Lee1, ~Esra'a_Saleh1

Submission Number: 89

Loading