Abstract: In multi-agent reinforcement learning environments, value decomposition methods are popularly applied to address the cooperation issue among agents. However, in some multi-agent value decomposition methods, the global action-value is usually approximated using upper and lower bounds, and leads to a lack of fine-grained cooperative actions. Furthermore, current state-of-the-art value decomposition approaches are predominantly confined to addressing cooperative learning problems involving small-scale multi-agent systems. As the number of agents increases, these methods may lead to difficulties in the convergence of the Q value function especially in more complex cooperative scenarios. To address the above two challenges, we propose a Group-based QMIX (GQMIX) method which learns to dynamically divide agents into multiple groups during exploration while applying Graph Attention Network (GAT) to simultaneously learn value decomposition under both global observation and local observation. This enables the subdivision of agents into different groups in large-scale settings, allowing the learning of the common subtasks in complex scenarios and improving the convergence efficiency of the value function. Experimental results demonstrate that our proposed algorithm is valid by providing better scheduling solutions for the extended flexible job shop scheduling problem. And it outperforms existing multi-agent reinforcement learning methods in terms of convergence and stability.
External IDs:dblp:conf/case/HongWFSHLX25
Loading