Trajectory First: A Curriculum for Discovering Diverse Policies

ICLR 2026 Conference Submission18218 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Exploration, Diversity, Robotics
Abstract: Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has emerged as a powerful reinforcement learning (RL) framework to train a diverse set of agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robotic manipulation, leading to a lack in policy diversity. To improve diversity optimization in RL, we therefore propose a two-stage curriculum. The key idea of our method is to leverage a spline-based trajectory prior as an inductive bias to generate diverse, high-reward behaviors in the first stage, before learning step-based policies in the second. In our empirical evaluation, we provide novel insights into shortcomings of skill-based diversity optimization, and demonstrate empirically that our curriculum improves the diversity of the learned skills.
Primary Area: reinforcement learning
Submission Number: 18218
Loading