Keywords: multi-agent reinforcement learning, causal influence, gumbel-max attention
Abstract: In this paper, we define A-Q influence to capture the state-dependent causal influence relationship between individual actions and individual action value functions in an MARL problem. Then influence-based local value functions (ILVFs) are constructed and shown to be equivalent to the global value function in terms of policy gradient estimation. To efficiently attain the agent-wise A-Q influence, we propose to infer A-Q influence according to state influence, which is learned by a Gumbel-max attention mechanism. To evaluate the effectiveness of ILVF, we integrate it into the MAPPO framework and propose the ILVF-P algorithm. Extensive experiments on diverse MARL benchmarks reveal that ILVF-P consistently surpasses strong baselines, underscoring its benefits in facilitating the training efficiency.
Primary Area: causal reasoning
Submission Number: 17436
Loading