Trajectory First: A Curriculum for Discovering Diverse Policies

Cornelius V. Braun; Sayantan Auddy; Marc Toussaint

Trajectory First: A Curriculum for Discovering Diverse Policies

Cornelius V. Braun, Sayantan Auddy, Marc Toussaint

Published: 22 Jun 2025, Last Modified: 27 Jul 2025IBRL @ RLC 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Skill Discovery, Exploration, Skill Diversity

Abstract: Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has emerged as a powerful reinforcement learning (RL) framework to train a diverse set of agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robotic manipulation, leading to a lack in policy diversity. To improve diversity optimization in RL, we therefore propose a two-stage curriculum for diversity optimization. The key idea of our method is to leverage a structured spline-based trajectory prior as an inductive bias to seed diverse, high-reward behaviors before learning step-based policies. In our empirical evaluation, we provide novel insights into the shortcoming of skill-based diversity optimization, and demonstrate empirically that our curriculum improves the diversity of the learned skills.

Submission Number: 20

Loading