Exploiting Environmental Variation to Improve Policy Robustness in  Reinforcement Learning

Siddharth Mysore; Robert Platt; Kate Saenko

Exploiting Environmental Variation to Improve Policy Robustness in Reinforcement Learning

Siddharth Mysore, Robert Platt, Kate Saenko

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Conventional reinforcement learning rarely considers how the physical variations in the environment (eg. mass, drag, etc.) affect the policy learned by the agent. In this paper, we explore how changes in the environment affect policy generalization. We observe experimentally that, for each task we considered, there exists an optimal environment setting that results in the most robust policy that generalizes well to future environments. We propose a novel method to exploit this observation to develop robust actor policies, by automatically developing a sampling curriculum over environment settings to use in training. Ours is a model-free approach and experiments demonstrate that the performance of our method is on par with the best policies found by an exhaustive grid search, while bearing a significantly lower computational cost.

Keywords: Reinforcement Learning, Policy Robustness, Policy generalization, Automated Curriculum

TL;DR: By formulating the learning curriculum as a bandit problem, we present a principled approach to motivating policy robustness in continuous controls tasks.

7 Replies

Loading