Mechanism Plausibility in Generative Agent-Based Modeling

Patrick Zhao; David Huu Pham; Nicholas Vincent

Mechanism Plausibility in Generative Agent-Based Modeling

Patrick Zhao, David Huu Pham, Nicholas Vincent

Published: 09 May 2026, Last Modified: 09 May 2026PoliSim@CHI 2026EveryoneRevisionsCC BY 4.0

Keywords: Agent-Based Modeling, Mechanisms, Generative Agents, Large Language Models, Philosophy of Science

Abstract: Large language models (LLMs) can generate high-level phenomena without explicitly programmed rules; this capability has led to their adoption within agent-based models (ABMs) and social simulations. Many recent research papers aim to test whether they are capable of generating different phenomena of interest, for example, human behavior on social media platforms or performance in game-theoretic scenarios. However, capability and explanation are different---drawing from the philosophy of science and mechanisms literature, \textit{explanation} requires showing, to some degree, how a phenomenon is produced by related organized entities and activities. For new modelers, this can be difficult without being grounded in seemingly distant research areas. We integrate recent work on generative ABMs with contemporary philosophy of science literature and make two main contributions. First, we gather insights on modeling from simulation and mechanisms literature and use those insights to operationalize a definition of plausibility in a four-level scale. Our formalization separates the evaluation of a model's generative sufficiency (ability to reproduce a phenomenon) from its mechanistic plausibility (how the phenomenon could be produced). We introduce this as the Mechanism Plausibility Scale, increasing as more parts of the model and mechanisms are operationalized, falsifiable, and backed with evidence. Second, we discuss recent LLM-ABM work and find conflating evidence of agent-level functionality with claims about emergent ABM-level phenomenon, relying on `believability' metrics that focus on generative sufficiency. Our discussion section speaks on how these findings echo long-standing problems in classical ABM, historical harms caused by these issues, and other considerations on LLM usage for the modeling community. Using the findings from our review, we offer the scale as a practical heuristic in the form of a checklist which can clarify how simulations at different levels of plausibility may be useful.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 19

Loading