Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity

Robby Costales; Stefanos Nikolaidis

Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity

Robby Costales, Stefanos Nikolaidis

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diversity, meta reinforcement learning, meta-RL, reinforcement learning, adaptation, adaptive, agents, open-endedness, genotypes, phenotypes, simulators, simulation, generalization, meta-reinforcement

TL;DR: DIVA is an evolutionary approach for generating meaningfully diverse meta-RL training tasks in truly open-ended simulators.

Abstract: The wider application of end-to-end learning methods to embodied decision-making domains remains bottlenecked by their reliance on a superabundance of training data representative of the target domain. Meta-reinforcement learning (meta-RL) approaches abandon the aim of zero-shot *generalization*—the goal of standard reinforcement learning (RL)—in favor of few-shot *adaptation*, and thus hold promise for bridging larger generalization gaps. While learning this meta-level adaptive behavior still requires substantial data, efficient environment simulators approaching real-world complexity are growing in prevalence. Even so, hand-designing sufficiently diverse and numerous simulated training tasks for these complex domains is prohibitively labor-intensive. Domain randomization (DR) and procedural generation (PG), offered as solutions to this problem, require simulators to possess carefully-defined parameters which directly translate to meaningful task diversity—a similarly prohibitive assumption. In this work, we present **DIVA**, an evolutionary approach for generating diverse training tasks in such complex, open-ended simulators. Like unsupervised environment design (UED) methods, DIVA can be applied to arbitrary parameterizations, but can additionally incorporate realistically-available domain knowledge—thus inheriting the *flexibility* and *generality* of UED, and the supervised *structure* embedded in well-designed simulators exploited by DR and PG. Our empirical results showcase DIVA's unique ability to overcome complex parameterizations and successfully train adaptive agent behavior, far outperforming competitive baselines from prior literature. These findings highlight the potential of such *semi-supervised environment design* (SSED) approaches, of which DIVA is the first humble constituent, to enable training in realistic simulated domains, and produce more robust and capable adaptive agents. Our code is available at [https://github.com/robbycostales/diva](https://github.com/robbycostales/diva).

Primary Area: Reinforcement learning

Submission Number: 16787

Loading