That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities

Jack Parker-Holder; Minqi Jiang; Michael D Dennis; Mikayel Samvelyan; Jakob Nicolaus Foerster; Edward Grefenstette; Tim Rocktäschel

That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities

Jack Parker-Holder, Minqi Jiang, Michael D Dennis, Mikayel Samvelyan, Jakob Nicolaus Foerster, Edward Grefenstette, Tim Rocktäschel

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Reinforcement Learning, Unsupervised Environment Design

Abstract: Deep Reinforcement Learning (RL) has recently produced impressive results in a series of settings such as games and robotics. However, a key challenge that limits the utility of RL agents for real-world problems is the agent's ability to generalize to unseen variations (or levels). To train more robust agents, the field of Unsupervised Environment Design (UED) seeks to produce a curriculum by updating both the agent and the distribution over training environments. Recent advances in UED have come from promoting levels with high regret, which provides theoretical guarantees in equilibrium and empirically has been shown to produce agents capable of zero-shot transfer to unseen human-designed environments. However, current methods require either learning an environment-generating adversary, which remains a challenging optimization problem, or curating a curriculum from randomly sampled levels, which is ineffective if the search space is too large. In this paper we instead propose to evolve a curriculum, by making edits to previously selected levels. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), produces levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior works, while outperforming them empirically when transferring to complex out-of-distribution environments.

One-sentence Summary: Generating curricula for RL agents by making edits to levels which previously had high learning potential, making sure the agent is constantly tested at the frontier of its capabilities.

17 Replies

Loading