Status-Quo Policy Gradient in Multi-agent Reinforcement Learning

Pinkesh Badjatiya; Mausoom Sarkar; Abhishek Sinha; Nikaash Puri; Jayakumar Subramanian; Siddharth Singh; Balaji Krishnamurthy

Status-Quo Policy Gradient in Multi-agent Reinforcement Learning

Pinkesh Badjatiya, Mausoom Sarkar, Abhishek Sinha, Nikaash Puri, Jayakumar Subramanian, Siddharth Singh, Balaji Krishnamurthy

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: multi-agent rl, reinforcement learning, social dilemma, policy gradient, game theory

Abstract: Individual rationality, which involves maximizing expected individual return, does not always lead to optimal individual or group outcomes in multi-agent problems. For instance, in social dilemma situations, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to mutual defection that is individually and socially sub-optimal. In contrast, humans evolve individual and socially optimal strategies in such social dilemmas. Inspired by ideas from human psychology that attribute this behavior in humans to the status-quo bias, we present a status-quo loss (SQLoss) and the corresponding policy gradient algorithm that incorporates this bias in an RL agent. We demonstrate that agents trained with SQLoss evolve individually as well as socially optimal behavior in several social dilemma matrix games. To apply SQLoss to games where cooperation and defection are determined by a sequence of non-trivial actions, we present GameDistill, an algorithm that reduces a multi-step game with visual input to a matrix game. We empirically show how agents trained with SQLoss on a GameDistill reduced version of the Coin Game evolve optimal policies.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=nIyzrXQdWv

14 Replies

Loading