Multi-Agent AI Systems Need Institutional Design, Not Just Model-Level Alignment

Published: 03 Jun 2026, Last Modified: 03 Jun 2026AI4GOOD Workshop 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-agent AI, AI agents, cooperation, institutional design, AI safety, sanctioning mechanisms, public goods games, mechanism design, agent societies
Abstract: Autonomous AI agents powered by large language models are increasingly embedded in multi-agent settings, where success depends on cooperation and sharing tools, finite resources, action space, decision authority. These shared resources form artificial commons. Familiar collective-action failures can reappear, including free-riding, over-extraction, miscoordination, costly enforcement, punishment cascades, and failures of repair. Current alignment techniques, geared towards appropriate responses to humans, fall short of solving this problem. This position paper argues that future AI safety research must evaluate agent societies as governed systems, not merely as collections of individually aligned models. We focus on sanctions because they are the point at which institutional rules acquire consequences. However, our central claim is not that agent societies need more punishment. We make three contributions. First, we argue for evaluating the model-in-institution, since individually aligned agents can still produce collectively harmful outcomes. Second, we distinguish sanctions from the institutional functions that make sanctions interpretable, legitimate, proportional, and reversible. Third, we propose an evaluation agenda for testing cooperation-shaping institutions across repeated multi-agent dilemmas, varying not only model family and task, but also resource structure, group composition, enforcement architecture, and repair pathways. The goal is not simply to maximize cooperation, but to design agent institutions that dynamically sustain beneficial forms of cooperation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 308
Loading