Comparing Collective Behavior of LLM and Human Groups

Anna B. Stephenson; Andrew Zhu; Chris Callison-Burch; Jan Kulveit

Comparing Collective Behavior of LLM and Human Groups

Anna B. Stephenson, Andrew Zhu, Chris Callison-Burch, Jan Kulveit

Published: 23 Sept 2025, Last Modified: 18 Nov 2025ACA-NeurIPS2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent systems, collective behavior, emergence, social simulation, ai safety

TL;DR: We use Dungeons and Dragons as a model social system to empirically compare human and LLM-agent groups, finding distinct emergent behaviors.

Abstract: Large language models (LLMs) are being deployed as agents in complex human social systems, which could impact human organizing and collective action. Yet, most safety evaluations focus on one-on-one interactions, which overlooks emergent group behaviors. Because we lack a quantitative baseline for comparison, there is a gap in our understanding of how the social dynamics of LLM-agent groups compare to those of humans. To address this, we use the role-playing game Dungeons \& Dragons as a model social system, first analyzing a large human dataset of 985 games to establish a behavioral baseline and then using a multi-agent simulation to have LLMs play the same games under different prompting conditions. We measured emergent social dynamics through text-based metrics for creativity and group cohesion. In this preliminary work, we simulated seven games that mirror the characters, initial scenario, and turn order of specific human games, spanning 69--502 turns and 5--7 players. We find that LLM agents show lower emergent creativity and higher cohesion compared to human games, and that simple persona prompting does not align their behavior to the human baselines. These preliminary results reveal measurable social differences between LLM and human groups, suggesting that the integration of LLM agents into our social networks could impact how we collectively create and collaborate.

Submission Number: 38

Loading