Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning

ICLR 2026 Conference Submission15895 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: action-dependent policy, multi-agent reinforcement learning, global optimality, coordination graph
Abstract: Action-dependent policies, which condition decisions on both states and other agents' actions, provide a powerful alternative to independent policies in multi-agent reinforcement learning. Most existing studies have focused on auto-regressive formulations, where each agent's policy depends on the actions of all preceding agents. However, this approach suffers from severe scalability limitations as the number of agents grows. In contrast, sparse dependency structures, where each agent relies only on a subset of other agents, remain largely unexplored and lack rigorous theoretical foundations. To address this gap, we introduce the action dependency graph (ADG) to model sparse inter-agent action dependencies. We prove that action-dependent policies converge to solutions stronger than Nash equilibria, which often trap independent policies, and we refer to such solutions as $G_d$-locally optimal policies. Furthermore, within coordination graph (CG) structured problems, we show that a $G_d$-locally optimal policy attains global optimality when the ADG satisfies specific CG-induced conditions. To substantiate our theory, we develop a tabular policy iteration algorithm that converges exactly as predicted. We further extend a standard deep MARL method to incorporate action-dependent policies, confirming the practical relevance of our framework.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 15895
Loading