Altared Environments: The Role of Normative Infrastructure in AI Alignment

Published: 18 Jun 2024, Last Modified: 16 Jul 2024Agentic Markets @ ICML'24 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Normative Institutions, Multi Agent Reinforcement Learning, Cooperative AI, Mixed Motive Games
Abstract: Cooperation is central to human life, distinguishing humans as ultra-cooperative among mammals. We form stable groups that enhance welfare through mutual protection, knowledge sharing, and economic exchanges. As artificial intelligence gains autonomy in shared environments, ensuring AI agents can engage in cooperative behaviors is crucial. Research in AI views this as an alignment challenge and frames it in terms of embedding norms and values in AI systems. Such an approach, while promising, neglects how humans achieve stable cooperation through \textit{normative infrastructure}. This infrastructure establishes shared norms enforced by agents who recognize and sanction norm violations. Using multi-agent reinforcement learning (MARL), we investigate the impact of normative infrastructure on agents' learning dynamics and their cooperative abilities in mixed-motive games. We introduce the concept of an \textit{\textbf{altar}}, an environmental feature that encodes actions deemed sanctionable by a group of agents. Comparing the performance of simple, independent learning agents in environments with and without the altar, we assess the potential of normative infrastructure in facilitating AI agent alignment to foster stable cooperation.
Submission Number: 35
Loading