Keywords: Meta-RL, Multi-Agent RL
TL;DR: We propose treating interaction with evolving agents in multi-agent RL as a meta-RL problem.
Abstract: In Multi-Agent RL, agents learn and evolve together, and each agent has to interact with a changing set of other agents. While generally viewed as a problem of non-stationarity, we propose that this can be viewed as a Meta-RL problem. We demonstrate an approach for learning Stackelberg equilibria, a type of equilibrium that features a bi-level optimization problem, where the inner level is a "best-response" of one or more follower agents to an evolving leader agent. Various approaches have been proposed in the literature to implement this best-response, most often treating each leader policy and the learning problem it induces for the follower(s) as a separate instance. We propose that the problem can be viewed as a meta (reinforcement) learning problem: Learning to learn to best-respond to different leader behaviors, by leveraging commonality in the induced follower learning problems. We demonstrate an approach using contextual policies and show that it matches performance of existing approaches using significantly fewer environment samples in experiments. We discuss how more advanced meta-RL techniques could allow this to scale to richer domains.