Marginal Benefit Induced Unsupervised Environment Design

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Unsupervised Environment Design, Marginal Benefit, Diversity, Curriculum, RL
TL;DR: We propose an algorithm called MBeDED, which focuses on marginal benefit in Unsupervised Environment Design.
Abstract: Training generally capable Reinforcement Learning (RL) agents in complex environments is a challenging task that involves designing appropriate distributions of environments. Recent research has highlighted the potential of the Unsupervised Environment Design (UED) framework, which generates environments at the frontier of the agent’s capabilities through adaptive curriculum learning using a regret-based objective. While regret-based approaches have shown great promise in generating feasible environments, they can produce difficult environments that are challenging for the agent to learn from. This is because regret represents the best-case learning potential of an environment, without indicating how much the agent can actually learn from it. To address this limitation, we propose an alternative objective that employs marginal benefit, focusing on the improvement in the agent policy associated with the environment. This new objective generates environments at a suitable pace for the agent's learning and thus achieves rapid convergence. Additionally, to improve the generalizability of the student agent, we introduce a novel diversity metric that aims to generate varied experiences for the agent. Finally, we provide detailed experimental results and ablation analysis to showcase the effectiveness of our new methods. Notably, our approach signifies the potential future interest in controlled environment generation within UED, particularly in a landscape currently dominated by algorithms based on random generation.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7031
Loading