Context and History Aware Other-ShapingDownload PDF

Anonymous

08 Oct 2022 (modified: 05 May 2023)Submitted to Deep RL Workshop 2022Readers: Everyone
Keywords: Opponent Shaping, Meta Reinforcement Learning, Mutli-Agent
TL;DR: Novel Opponent Shaping algorithm which is applicable to grid worlds and scalable
Abstract: Cooperation failures, in which self-interested agents converge to collectively worst-case outcomes, are a common failure mode of Multi-Agent Reinforcement Learning (MARL) methods. Methods such as Model-Free Opponent Shaping (M-FOS) and The Good Shepherd address this issue by shaping their co-player’s learning into mutual cooperation. However, these methods fail to capture important co-player learning dynamics or do not scale to co-players parameterised by deep neural networks. To address these issues, we propose Context and History Aware Other-Shaping (CHAOS). A CHAOS agent is a meta-learner parameterised by a recurrent neural network that learns to shape its co-player over multiple trials. CHAOS considers both the context (inter-episode information), and history (intra-episode information) to shape co-players successfully. CHAOS also successfully scales to shaping co-players parameterised by deep neural networks. In a set of experiments, we show that CHAOS achieves state-of-the-art shaping in matrix games. We provide extensive ablations, motivating the importance of both context and history. CHAOS also successfully shapes on a complex grid-world based game, demonstrating CHAOS’s scalability empirically. Finally, we provide empirical evidence that, counterintuitively, the widely-used Coin Game environment does not require history to learn shaping because states are often indicative of past actions. This suggests that the Coin Game is, in contrast to common understanding, unsuitable for investigating shaping in high-dimensional, multi-step environments.
Supplementary Material: zip
0 Replies

Loading