Sharing Minds during MARL Training for Enhanced Cooperative LLM Agents

Published: 30 Oct 2024, Last Modified: 13 Dec 2024LanGame PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Reinforcement Learning; LLM Agents; Theory of Mind
TL;DR: This work investigates the impact of explicitly augmenting Theory of Mind (ToM) capabilities during MARL training of LLM agents in multi-agent environments.
Abstract: LLM agents have shown promising capabilities by adopting advanced reasoning techniques such as Chain-of-Thought (CoT). Incorporating Theory of Mind (ToM) inference, which infers the goals and intentions of teammates, into the reasoning process is proven beneficial for enhancing the coordination ability of cooperative LLM agents. This work investigates the impact of explicitly augmenting Theory of Mind (ToM) capabilities during MARL training of LLM agents in multi-agent environments. To enhance ToM capabilities, we introduce a novel technique, Mind-Sharing, which obtains the ground-truth answers for the ToM inference of an agent during centralized training by rewriting the hidden minds of the other agent. Our experiments, conducted in the 2-player version of the cooperative game Hanabi, use the MAPPO as the MARL algorithm and LLaMA-2-7B as the base model. We find that the Mind-Sharing mechanism significantly improves both task performance and sample efficiency in MARL training. Our results reveal enhanced ToM capability, surpassing the ToM inference accuracy of a wide range of models in the self-play setting. Surprisingly, the ToM inference skill learned from self-play also generalizes to the cross-play setting.
Submission Number: 22
Loading