Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

Zhenglin Wan; Xingrui Yu; David Mark Bossens; Yueming Lyu; Qing Guo; Flint Xiaofeng Fan; Yew-Soon Ong; Ivor Tsang

Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

Zhenglin Wan, Xingrui Yu, David Mark Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Yew-Soon Ong, Ivor Tsang

Published: 01 May 2025, Last Modified: 26 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We designed a novel Extrinsic Behavior Curiosity module to significantly improve the efficiency and performance of Quality Diversity Policy Learning algorithms.

Abstract: Imitation learning (IL) has shown promise in various applications (e.g. robot locomotion) but is often limited to learning a single expert policy, constraining behavior diversity and robustness in unpredictable real-world scenarios. To address this, we introduce Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates quality-diversity optimization with IRL methods, enabling agents to learn diverse behaviors from limited demonstrations. This work introduces Extrinsic Behavioral Curiosity (EBC), which allows agents to receive additional curiosity rewards from an external critic based on how novel the behaviors are with respect to a large behavioral archive. To validate the effectiveness of EBC in exploring diverse locomotion behaviors, we evaluate our method on multiple robot locomotion tasks. EBC improves the performance of QD-IRL instances with GAIL, VAIL, and DiffAIL across all included environments by up to 185\%, 42\%, and 150\%, even surpassing expert performance by 20\% in Humanoid. Furthermore, we demonstrate that EBC is applicable to Gradient-Arborescence-based Quality Diversity Reinforcement Learning (QD-RL) algorithms, where it substantially improves performance and provides a generic technique for learning behavioral diverse policies. The source code of this work is provided at https://github.com/vanzll/EBC.

Lay Summary: Robots often learn how to move by observing expert demonstrations, a process called imitation learning. While this works well in controlled settings, it usually teaches the robot only one way to move. This lack of flexibility makes intelligent agents less capable in unpredictable or changing environments. To improve the diversity of robot behaviors, we develop a new method called **Quality Diversity Inverse Reinforcement Learning (QD-IRL)**. This technique allows robots to learn many different ways to move—even from a small number of expert examples—making them more adaptable and robust. As a key part of the QD-IRL algorithm, we propose **Extrinsic Behavioral Curiosity (EBC)**. EBC rewards the robot for trying new and different movement styles, not just following what it has already learned. It does this using an external system that tracks which behaviors are “novel” and encourages the robot to explore those. We test our approach in various simulated robot environments (like walking, jumping, or adapting to damage) and find that robots trained with EBC display a wider variety of effective movement styles. In fact, in some cases, they even performed better than the original expert demonstrations. While the approach is applied to a particular algorithm called Proximal Policy Gradient Arborescence, the approach can also potentially be used in a wide variety of quality diversity algorithms and in traditional reinforcement learning in addition to imitation learning. In these settings, EBC could be a powerful tool for teaching intelligent agents to handle complex, real-world challenges with diverse and creative behavior.

Primary Area: Reinforcement Learning->Inverse

Keywords: Quality Diversity, Reinforcement Learning, Imitation Learning

Link To Code: https://github.com/vanzll/EBC

Submission Number: 9327

Loading