Keywords: Instituions, Norms, Cooperative AI, Multi-agent Systems, Reinforcement Learning, Alignment
TL;DR: Normative Institutions can play a key role in driving alignment and fostering cooperation in AI Agents
Abstract: Cooperation is central to human societies, which they achieve by constantly tack-
ling the alignment problem of ensuring self-interested individuals act in ways that
benefit the groups in which they live. As AI agents become pervasive in shared
environments, it will be similarly crucial for them to align with the cooperative
goals of human groups. Current AI alignment research largely focuses on em-
bedding specified or learned norms into agents to achieve this cooperation. While
valuable, this approach overlooks the role that institutions play in aligning human
behavior to achieve cooperative gains and thus overlooks a potential alignment
technique for AI agents. We address this gap by proposing Altared Games, a
novel formal extension of Markov games that incorporates an altar—a classification institution providing explicit normative guidance to agents. Our approach focuses on a challenging setting where norms are dynamic, thereby requiring agents
to adapt to the evolving norm content represented by the altar. Using multi-agent
reinforcement learning (MARL) as a computational model of AI agents, we con-
duct experiments in two mixed-motive environments: Commons Harvest, which
models resource sustainability, and Allelopathic Harvest, which involves coordination under conflicting incentives. Our results demonstrate that the altar enables
agents to adapt effectively to dynamic norms, engage in accurate sanctioning, and
achieve higher social welfare compared to systems without a classification institution. These findings highlight the importance of normative institutions in fostering
cooperative, adaptable AI agents operating in complex real-world settings.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12670
Loading