Long Short-Term Reasoning Network with Theory of Mind for Efficient Multi-Agent Cooperation

Xiyun Li, Tielin Zhang, Chenghao Liu, Linghui Meng, Bo Xu

Published: 01 Jan 2024, Last Modified: 15 May 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Enhancing the theory of mind (ToM) ability of agents is becoming more and more critical in the research area of cooperative multi-agent reinforcement learning (MARL). ToM describes the ability of agents to understand their partners’ logic first, and then reason their intentions and behaviors accurately. In cognitive science, dual-reasoning pathway theory (DRPT) is a statistical ToM process, which claims that humans can achieve accurate and rapid reasoning by combining long-term and short-term reasoning (LTR and STR) pathways that cover different brain regions. However, most existing works focus on the quick decision-making ability of the STR, while overlooking the significance of the long-term ToM reasoning ability from the LTR. To emphasize such ability, we propose a long short-term reasoning (LSTR) algorithm which contains a large language model (LLM) for long-term reasoning and an additional augmenting module to decode the semantic space in LLM as the action space in MARL. Experimental results demonstrate that our LSTR algorithm has achieved significant improvement over competitive MARL methods (e.g., value-based QMIX and policy-based COMA) from the perspective of reward scores, convergence speed, and scalability.