Keywords: Collective Agency, Causal Abstraction, Causal Incentives, AI Safety
TL;DR: Combining causal games and causal abstraction for a mathematical foundation of collective agency
Abstract: A key challenge for the safety of advanced AI systems is the possibility that multiple simpler agents
might inadvertently form a collective agent with capabilities and goals distinct from those of any
individual. More generally, determining when a group of agents can be viewed as a unified collective
agent is a foundational question in the study of interactions and incentives in both biological
and artificial systems. We adopt a behavioral perspective in answering this question, ascribing
collective agency to a group when viewing the group’s joint actions as rational and goal-directed
successfully predicts its behavior. We formalize this perspective on collective agency using causal
games (Hammond et al., 2023) – which are causal models of strategic, multi-agent interactions –
and causal abstraction (Rubenstein et al., 2017; Beckers and Halpern, 2019) – which formalizes
when a simple, high-level model faithfully captures a more complex, low-level model. We use this
framework to solve a puzzle regarding multi-agent incentives in actor-critic models and to make
quantitative assessments of the degree of collective agency exhibited by different voting mechanisms.
Our framework aims to provide a foundation for theoretical and empirical work to understand,
predict, and control emergent collective agents in multi-agent AI systems
Pmlr Agreement: pdf
Submission Number: 72
Loading