Playing Hanabi with ToM and Intrinsic RewardsDownload PDF

Published: 23 Jan 2023, Last Modified: 05 May 2023PKU CoRe 22Fall PosterReaders: Everyone
Abstract: In recent years, artificial agents have made drastic advances in multi-player games such as Go and poker. However, cooperative games with imperfect information are relatively underexplored, despite that such setting is common in daily human-robot interaction. The card game Hanabi is an example, where reasoning about other agents' mental states (\eg, belief and intention) is brought to the foreground. It is a meaningful benchmark because it requires Theory of Mind (ToM) reasoning and challenges an agent’s decision-making ability in a partially observable and cooperative setting. Existing work on Hanabi achieves great self-play results. However, they fail to address these essential challenges in the other-play setting. To fill in the gap, we propose two innovative plug-in modules that can be applied to general RL agents. The Hand Card Information Completion module is designed to model other agents’ mental states and complement environment information. The Goal-Oriented Intrinsic Reward module encourages agents' exploration and collaboration. We believe such attempts will boost performance in this particular game and facilitate human-robot cooperation in a broader range of interactive scenarios. Our code is available at \url{https://github.com/LoYuXr/Hanabi_Plugins}.
Supplementary Material: zip
1 Reply

Loading