Benchmarking LLMs' Swarm Intelligence

Benchmarking LLMs' Swarm Intelligence

ACL ARR 2026 January Submission2512 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Swarm Intelligence, LLM, Benchmark

Abstract: Large Language Models (LLMs) show promise as autonomous agents. Yet their capacity for decentralized coordination remains underexplored, particularly in scenarios where agents operate with limited local perception and no centralized control. We introduce **SwarmBench**, a benchmark for evaluating emergent coordination in LLM-based swarms. The benchmark features five tasks in a physics-grounded 2D environment, requiring agents to achieve collective objectives through local interactions. Evaluating thirteen LLMs in a zero-shot setting, we find that no model achieves consistent cross-task success. We identify a **communication-coordination gap**: while agents are strongly influenced by peer messages, this linguistic alignment fails to produce effective collective action. The gap manifests in failure modes including spatial congestion, information silos, and protocol rigidity, revealing that current LLMs lack the grounded reasoning necessary for robust swarm intelligence. We release SwarmBench as an open-source toolkit for decentralized LLM coordination research.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: LLM/AI agents

Languages Studied: English

Submission Number: 2512

Loading