Grounding Aleatoric Uncertainty in Unsupervised Environment DesignDownload PDF

12 Oct 2021, 19:37 (modified: 28 Nov 2021, 10:54)Deep RL Workshop NeurIPS 2021Readers: Everyone
Keywords: reinforcement learning, curriculum learning, covariate shift, generalization, environment design, procedural content generation
TL;DR: We characterize how curriculum learning can induce suboptimal reinforcement learning policies with respect to a ground-truth distribution of environments, and propose a method for correcting this effect.
Abstract: In reinforcement learning (RL), adaptive curricula have proven highly effective for learning policies that generalize well under a wide variety of changes to the environment. Recently, the framework of Unsupervised Environment Design (UED) generalized notions of curricula for RL in terms of generating entire environments, leading to the development of new methods with robust minimax-regret properties. However, in partially-observable or stochastic settings (those featuring aleatoric uncertainty), optimal policies may depend on the ground-truth distribution over the aleatoric features of the environment. Such settings are potentially problematic for curriculum learning, which necessarily shifts the environment distribution used during training with respect to the fixed ground-truth distribution in the intended deployment environment. We formalize this phenomenon as curriculum-induced covariate shift, and describe how, when the distribution shift occurs over such aleatoric environment parameters, it can lead to learning suboptimal policies. We then propose a method which, given black-box access to a simulator, corrects this resultant bias by aligning the advantage estimates to the ground-truth distribution over aleatoric parameters. This approach leads to a minimax-regret UED method, SAMPLR, with Bayes-optimal guarantees.
0 Replies