Abstract: Contemporary embodied agents powered by large language models (LLMs), such as Voyager, have shown promising capabilities in individual learning within open-ended environments like Minecraft. However, when powered by open LLMs, they struggle with basic tasks even after domain-specific fine-tuning. We present MindForge, a generative-agent framework for collaborative lifelong learning through explicit perspective taking. We introduce three key innovations: (1) a structured theory of mind representation linking percepts, beliefs, desires, and actions; (2) natural interagent communication; and (3) a multicomponent memory system. In Minecraft experiments, MindForge agents powered by open-weight LLMs significantly outperform their Voyager counterparts in basic tasks where traditional Voyager fails without GPT-4, collecting $2.3\times$ more unique items and achieving $3\times$ more tech-tree milestones, advancing from basic wood tools to advanced iron equipment. MindForge agents demonstrate sophisticated behaviors, including expert-novice knowledge transfer, collaborative problem solving, and adaptation to out-of-distribution tasks through accumulated collaborative experiences. MindForge advances the democratization of embodied AI development through open-ended social learning, enabling peer-to-peer knowledge sharing.
Loading