Inducing Cooperation via Learning to reshape rewards in semi-cooperative multi-agent reinforcement learning

David Earl Hostallero; Daewoo Kim; Kyunghwan Son; Yung Yi

Inducing Cooperation via Learning to reshape rewards in semi-cooperative multi-agent reinforcement learning

David Earl Hostallero, Daewoo Kim, Kyunghwan Son, Yung Yi

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We propose a deep reinforcement learning algorithm for semi-cooperative multi-agent tasks, where agents are equipped with their separate reward functions, yet with willingness to cooperate. Under these semi-cooperative scenarios, popular methods of centralized training with decentralized execution for inducing cooperation and removing the non-stationarity problem do not work well due to lack of a common shared reward as well as inscalability in centralized training. Our algorithm, called Peer-Evaluation based Dual DQN (PED-DQN), proposes to give peer evaluation signals to observed agents, which quantifies how they feel about a certain transition. This exchange of peer evaluation over time turns out to render agents to gradually reshape their reward functions so that their action choices from the myopic best-response tend to result in the good joint action with high cooperation. This evaluation-based method also allows flexible and scalable training by not assuming knowledge of the number of other agents and their observation and action spaces. We provide the performance evaluation of PED-DQN for the scenarios ranging from a simple two-person prisoner's dilemma to more complex semi-cooperative multi-agent tasks. In special cases where agents share a common reward function as in the centralized training methods, we show that inter-agent evaluation leads to better performance

Keywords: multi-agent reinforcement learning, deep reinforcement learning, multi-agent systems

TL;DR: We use an peer evaluation mechanism to make semi-cooperative agents learn collaborative strategies in multiagent reinforcement learning settings

8 Replies

Loading