An Optimisation Framework for Unsupervised Environment Design

Nathan Monette; Alistair Letcher; Michael Beukman; Matthew Thomas Jackson; Alexander Rutherford; Alexander David Goldie; Jakob Nicolaus Foerster

An Optimisation Framework for Unsupervised Environment Design

Nathan Monette, Alistair Letcher, Michael Beukman, Matthew Thomas Jackson, Alexander Rutherford, Alexander David Goldie, Jakob Nicolaus Foerster

Published: 09 May 2025, Last Modified: 15 Jul 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Optimisation, Environment Design, Reinforcement Learning, Robustness

TL;DR: We use min-max optimization techniques to derive a convergence guarantee for Unsupervised Environment Design, and then extend our method to practical applications experimentally.

Abstract: For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent's generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing stronger theoretical guarantees for practical settings than prior work. Whereas previous methods relied on guarantees *if* they reach convergence, our framework employs a nonconvex-strongly-concave objective for which we provide a *provably convergent* algorithm in the zero-sum setting. We empirically verify the efficacy of our method, outperforming prior methods in a number of environments with varying difficulties.

Publication Agreement Form: pdf

Submission Number: 238

Loading