Sequence value decomposition transformer for cooperative multi-agent reinforcement learning

Published: 01 Jan 2025, Last Modified: 26 Sept 2025Inf. Sci. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing multi-agent reinforcement learning (MARL) methods that utilize the centralized training with decentralized execution (CTDE) paradigm have achieved great empirical success in cooperative tasks. However, the CTDE paradigm struggles to capture the unequal interactions of agents by evaluating the joint actions simultaneously. In this paper, we introduce the concept of action sequences, which consider the unequal interactions among agents from multiple perspectives through different action orderings. Subsequently, we propose the multi-agent sequence value decomposition, allowing for a more comprehensive estimation of the joint q-value function through action sequences. Building on this, we construct a value decomposition transformer (VDT) framework to implement the multi-agent sequence value decomposition within the CTDE paradigm. By utilizing the transformer network, the VDT framework completes the centralized training with action sequences, resulting in enhancing cooperation capability in coordinated learning. Extensive experiments on the predator-prey task and the StarCraft multi-agent challenge demonstrate that our proposed VDT framework achieves significantly improved learning speed and cooperative performance. Compared to the state-of-the-art methods, VDT exhibits significant improvement in learning efficiency within the same timesteps and achieves an average 20% enhancement within the final cooperative performance.
Loading