Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration

Borja G. León; Francesco Riccio; Kaushik Subramanian; Peter R. Wurman; Peter Stone

Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration

Borja G. León, Francesco Riccio, Kaushik Subramanian, Peter R. Wurman, Peter Stone

Published: 25 Sept 2024, Last Modified: 19 Dec 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Policy Diversity, Generalization

TL;DR: Novel algorithm for learning diverse near-optimal policies capable of generalizing within and out-of distribution

Abstract: The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solution to a given problem, making them brittle to problem variations. Replicating human flexibility in reinforcement learning agents is the challenge that we explore in this work. We tackle this challenge by extending state-of-the-art approaches to introduce DUPLEX, a method that explicitly defines a diversity objective with constraints and makes robust estimates of policies’ expected behavior through successor features. The trained agents can (i) learn a diverse set of near-optimal policies in complex highly-dynamic environments and (ii) exhibit competitive and diverse skills in out-of-distribution (OOD) contexts. Empirical results indicate that DUPLEX improves over previous methods and successfully learns competitive driving styles in a hyper-realistic simulator (i.e., GranTurismo ™ 7) as well as diverse and effective policies in several multi-context robotics MuJoCo simulations with OOD gravity forces and height limits. To the best of our knowledge, our method is the first to achieve diverse solutions in complex driving simulators and OOD robotic contexts. DUPLEX agents demonstrating diverse behaviors can be found at https://ai.sony/publications/Discovering-Creative-Behaviors-through-DUPLEX-Diverse-Universal-Features-for-Policy-Exploration/.

Primary Area: Reinforcement learning

Submission Number: 2436

Loading