Dynamic Causal Influence Learning in Cooperative Multi-Agent Reinforcement Learning

Dynamic Causal Influence Learning in Cooperative Multi-Agent Reinforcement Learning

ICLR 2026 Conference Submission17436 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent reinforcement learning, causal influence, gumbel-max attention

Abstract: In this paper, we define A-Q influence to capture the state-dependent causal influence relationship between individual actions and individual action value functions in an MARL problem. Then influence-based local value functions (ILVFs) are constructed and shown to be equivalent to the global value function in terms of policy gradient estimation. To efficiently attain the agent-wise A-Q influence, we propose to infer A-Q influence according to state influence, which is learned by a Gumbel-max attention mechanism. To evaluate the effectiveness of ILVF, we integrate it into the MAPPO framework and propose the ILVF-P algorithm. Extensive experiments on diverse MARL benchmarks reveal that ILVF-P consistently surpasses strong baselines, underscoring its benefits in facilitating the training efficiency.

Primary Area: causal reasoning

Submission Number: 17436

Loading