Reducing overestimation with attentional multi-agent twin delayed deep deterministic policy gradient
Abstract: In multi-agent reinforcement learning, establishing effective communication protocols is crucial for enhancing agent collaboration. However, traditional communication methods face challenges in scalability and efficiency as the number of agents increases, due to the expansion in the dimensions of observation and action spaces. This leads to heightened resource consumption and degrades performance in large multi-agent scenarios. To address these issues, we introduce a novel Attentional Multi-agent Twin Delayed Deep Deterministic Policy Gradient (AMATD3) algorithm that incorporates an attentional communication policy gradient approach. This approach selectively initiates communications through an attention unit that assesses the necessity of information exchange among agents, combined with a communication module that effectively integrates essential information. By implementing a double-Q function, AMATD3 further addresses issues of overestimation and suboptimal policy choices in existing methods, enhancing the algorithm's accuracy and reducing communication overheads. Specifically, our algorithm demonstrates superior performance in the StarCraft II environment by achieving higher cumulative rewards and enhancing task success rates compared to existing algorithms. For example, AMATD3 yields reward values of 16.908 and 6.858 for the 8m and 25m scenarios, respectively, which is more than double the reward achieved by other methods. This confirms the algorithm's enhanced efficiency and effectiveness in complex multi-agent settings, contributing to the ongoing development of scalable and efficient communication protocols in artificial intelligence.
Loading