Keywords: Meta reinforcement learning, Distributional robustness
TL;DR: Meta Reinforcement Learning Resilient to Distribution Shift via Adaptive Distributional Robustness
Abstract: Meta-reinforcement learning algorithms provide a data-driven way to acquire learning algorithms that quickly adapt to many tasks with varying rewards or dynamics functions. However, learned meta-policies are often effective only on the exact task distribution on which the policy was trained, and struggle in the presence of distribution shift of test-time rewards or transition dynamics. In this work, we develop a framework for meta-RL algorithms that are able to behave appropriately under test-time distribution shifts in the space of tasks. Our framework centers on an adaptive approach to distributional robustness, in which we train a population of meta-agents to be robust to varying levels of distribution shift, so that when evaluated on a (potentially shifted) test-time distribution of tasks, we can adaptively choose the most appropriate meta-agent to follow. We formally show how this framework allows for improved regret under distribution shift, and empirically show its efficacy on simulated robotics problems under a wide range of distribution shifts.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/arxiv:2210.03104/code)