Playing Hanabi with ToM and Intrinsic Rewards

Yilue Qian; Yuxuan Luo; Haoxiang Yang; Siwen Xie

Playing Hanabi with ToM and Intrinsic Rewards

Yilue Qian, Yuxuan Luo, Haoxiang Yang, Siwen Xie

Published: 23 Jan 2023, Last Modified: 05 May 2023PKU CoRe 22Fall PosterReaders: Everyone

Abstract: In recent years, artificial agents have made drastic advances in multi-player games such as Go and poker. However, cooperative games with imperfect information are relatively underexplored, despite that such setting is common in daily human-robot interaction. The card game Hanabi is an example, where reasoning about other agents' mental states (\eg, belief and intention) is brought to the foreground. It is a meaningful benchmark because it requires Theory of Mind (ToM) reasoning and challenges an agent’s decision-making ability in a partially observable and cooperative setting. Existing work on Hanabi achieves great self-play results. However, they fail to address these essential challenges in the other-play setting. To fill in the gap, we propose two innovative plug-in modules that can be applied to general RL agents. The Hand Card Information Completion module is designed to model other agents’ mental states and complement environment information. The Goal-Oriented Intrinsic Reward module encourages agents' exploration and collaboration. We believe such attempts will boost performance in this particular game and facilitate human-robot cooperation in a broader range of interactive scenarios. Our code is available at \url{https://github.com/LoYuXr/Hanabi_Plugins}.

Supplementary Material: zip

1 Reply

Loading