Planning with Theory of Mind for Few-Shot Adaptation in Sequential Social Dilemmas

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Few-shot adaptation to unknown policies, Opponent modeling, Multi-agent reinforcement learning, Mixed-motive game, Decentralized training, Monte Carlo Tree Search
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to other agents in mixed-motive environments remains a significant challenge. One feasible approach is to use Theory of Mind (ToM) to reason about the mental states of other agents and model their behaviors. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred information. To address these issues, we propose Planning with Theory of Mind (PToM), a novel multi-agent algorithm that enables few-shot adaptation to unseen policies in sequential social dilemmas (SSDs). PToM is hierarchically composed of two modules: an opponent modeling module that utilizes ToM to infer others' goals and learn corresponding goal-conditioned policies, and a planning module that employs Monte Carlo Tree Search (MCTS) to identify the best response. Our approach improves efficiency by updating beliefs about others' goals both between and within episodes and by using information from the opponent modeling module to guide planning. Experimental results demonstrate that in three representative SSD paradigms, PToM converges expeditiously, excels in self-play scenarios, and exhibits superior few-shot adaptation capabilities when interacting with various unseen agents. Furthermore, the emergence of social intelligence during our experiments underscores the potential of our approach in complex multi-agent environments.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5187
Loading