Keywords: Offline reinforcement learning, Reinforcement learning
Abstract: Offline Multi-Agent Reinforcement Learning (MARL) enables policy learning from static datasets in multi-agent systems, eliminating the need for risky or costly environment interactions during training. A central challenge in offline MARL lies in achieving effective collaboration among heterogeneous agents under the constraints of fixed datasets, where \textbf{conservatism} is introduced to restrict behaviors to data-supported distributions. Agents with distinct roles and capabilities require individualized conservatism - yet must maintain cohesive team performance. However, existing approaches often apply uniform conservatism across all agents, leading to over-constraining critical agents and under-constraining others, which hampers effective collaboration.
To address this issue, a novel framework, \textbf{OMCDA}, is proposed, where the degree of conservatism is dynamically adjusted for individual agents based on their impact on overall system performance. The framework is characterized by two key innovations: (1) A decomposed Q-function architecture is introduced to disentangle return computation from policy deviation assessment, allowing precise evaluations of each agent's contribution; and (2) An adaptive conservatism mechanism is developed to scale constraint strength according to both behavior policy divergence and the estimated importance of agents to the system.
Experiments on MuJoCo and SMAC show OMCDA outperforms existing offline MARL methods, effectively balancing the flexibility and conservatism across agents while ensuring fair credit assignment and better collaboration.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 12485
Loading