Altared Environments: The Role of Normative Infrastructure in AI Alignment

ICLR 2025 Conference Submission12670 Authors

28 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Instituions, Norms, Cooperative AI, Multi-agent Systems, Reinforcement Learning, Alignment
TL;DR: Normative Institutions can play a key role in driving alignment and fostering cooperation in AI Agents
Abstract: Cooperation is central to human societies, which they achieve by constantly tack- ling the alignment problem of ensuring self-interested individuals act in ways that benefit the groups in which they live. As AI agents become pervasive in shared environments, it will be similarly crucial for them to align with the cooperative goals of human groups. Current AI alignment research largely focuses on em- bedding specified or learned norms into agents to achieve this cooperation. While valuable, this approach overlooks the role that institutions play in aligning human behavior to achieve cooperative gains and thus overlooks a potential alignment technique for AI agents. We address this gap by proposing Altared Games, a novel formal extension of Markov games that incorporates an altar—a classification institution providing explicit normative guidance to agents. Our approach focuses on a challenging setting where norms are dynamic, thereby requiring agents to adapt to the evolving norm content represented by the altar. Using multi-agent reinforcement learning (MARL) as a computational model of AI agents, we con- duct experiments in two mixed-motive environments: Commons Harvest, which models resource sustainability, and Allelopathic Harvest, which involves coordination under conflicting incentives. Our results demonstrate that the altar enables agents to adapt effectively to dynamic norms, engage in accurate sanctioning, and achieve higher social welfare compared to systems without a classification institution. These findings highlight the importance of normative institutions in fostering cooperative, adaptable AI agents operating in complex real-world settings.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12670
Loading