Robust Multi-Agent Reinforcement Learning with Diverse Adversarial Agent Generation and Contrastive Policy Encoding
Keywords: Multi-agent Reinforcement Learning, Robust Coordination, Adversarial Training, Contrastive Representation Learning
TL;DR: We propose a robust MARL training framework that co-trains cooperative agents with a diverse adversarial policy generator and a contrastive policy encoder to improve generalization and robustness of multi-agent coordination.
Abstract: Multi-agent reinforcement learning (MARL) has emerged as a promising approach for learning coordination policies in multi-agent systems (MAS). However, policies trained by conventional MARL algorithms often overfit to specific team behaviors, limiting their ability to generalize and remain robust when faced with teammate failures or adversarial interventions. Such limitations pose significant challenges to the deployment of MARL in real-world applications. To address these issues, we propose a novel co-evolutionary robust MARL framework that enhances the robustness and generalization of MARL algorithms under policy disturbances and adversarial agents within MAS. Our framework comprises two key components: (1) DAAG: a Diverse Adversarial Agent Generator optimized via an information theoretic objective to produce behaviorally diverse and challenging adversarial agents, and (2) CAPE: a Contrastive learning-based Agent Policy Encoder that continuously learns informative representations of adversarial agents’ policies encountered during training, which are integrated into the MARL agents’ policy learning process to enable dynamical adaptation to diverse and evolving adversarial policies. These two components are optimized in a co-evolutionary training paradigm, enabling cooperative agents to robustly co-adapt alongside increasingly diverse adversaries. Comprehensive experiments conducted on the Predator-Prey and SMAC benchmarks demonstrate that our framework significantly outperforms baseline methods in both robustness and generalization capabilities.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 19897
Loading