Interactive Robust Policy Optimization for Multi-Agent Reinforcement Learning

Videh Raj Nema; Balaraman Ravindran

Interactive Robust Policy Optimization for Multi-Agent Reinforcement Learning

Videh Raj Nema, Balaraman Ravindran

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: Reinforcement learning, Multi-agent systems, Non-stationarity, Game theory, Robustness, Sim2real transfer

TL;DR: A strategic game-theoretic framework for multi-agent reinforcement learning robust to adversarial disturbances and for better sim2real transfer.

Abstract: As machine learning is applied more to real-world problems like robotics, control of autonomous vehicles, drones, and recommendation systems, it becomes essential to consider the notion of agency where multiple agents with local observations start impacting each other and interact to achieve their goals. Multi-agent reinforcement learning (MARL) is concerned with developing learning algorithms that can discover effective policies in multi-agent environments. In this work, we develop algorithms for addressing two critical challenges in MARL - non-stationarity and robustness. We show that naive independent reinforcement learning does not preserve the strategic game-theoretic interaction between the agents, and we present a way to realize the classical infinite order recursion reasoning in a reinforcement learning setting. We refer to this framework as Interactive Policy Optimization (IPO) and derive four MARL algorithms using centralized-training-decentralized-execution that generalize the widely used single-agent policy gradient methods to multi-agent settings. Finally, we provide a method to estimate opponent's parameters in adversarial settings using maximum likelihood and integrate IPO with an adversarial learning framework to train agents robust to destabilizing disturbances from the environment/adversaries and for better sim2real transfer from simulated multi-agent environments to the real world.

Supplementary Material: zip

0 Replies

Loading