CoSMAC: A Benchmark for Evaluating Communication and Coordination in LLM-Based Agents

Published: 10 Jan 2026, Last Modified: 10 Jan 2026LaMAS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multiagent Systems, Large Language Models, Natural Language Communication, Computation and Language, Machine Learning
Abstract: Large Language Models (LLMs) have recently demonstrated strong reasoning and communication abilities, motivating research into their potential as autonomous agents in multi-agent systems. In this work, we introduce Communicative SMAC (CoSMAC), a benchmark designed to systematically evaluate the communication and coordination capabilities of LLM-based agents. Built upon the well-established SMAC multi-agent reinforcement learning (MARL) environment, CoSMAC features a set of scenarios requiring varying degrees of micromanagement and communication, where agents must exchange information through natural language to achieve shared goals. We evaluate 8 state-of-the-art open-source and proprietary LLMs in zero-shot settings, analyzing model properties that are critical for communicative and cooperative behaviors. Based on these results, we then distill the Qwen2.5-7B model on the resulting dataset via supervised fine-tuning. We further compare the performance of LLM-based agents against a well-known MARL baseline trained without communication. Experimental results show that while LLMs struggle in scenarios demanding fine-grained micromanagement and spatial coordination, they can outperform the MARL baseline in tasks that rely more heavily on effective communication.
Submission Number: 44
Loading