Local Advantage Networks for Multi-Agent Reinforcement Learning in Dec-POMDPs

Local Advantage Networks for Multi-Agent Reinforcement Learning in Dec-POMDPs

10 Oct 2022 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn decentralized best-response policies via individual advantage functions. The learning is stabilized by a centralized critic whose primary objective is to reduce the moving target problem of the individual advantages. The critic, whose network's size is independent of the number of agents, is cast aside after learning. Evaluation on the StarCraft II multi-agent challenge benchmark shows that LAN reaches state-of-the-art performance and is more scalable with respect to the number of agents, opening up a new promising direction for MARL research.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Amir-massoud_Farahmand1

Submission Number: 494

Loading