Offline Multi-Agent Reinforcement Learning with Global Moderate Generalization
Keywords: Offline Multi-Agent Reinforcement Learning, Global Moderate Generalization
Abstract: Offline multi-agent reinforcement learning (MARL) suffers from severe value overestimation and extrapolation errors. These challenges lead to over-generalization when value functions or policies encounter out-of distribution (OOD) actions. The considerable body of work on generalization issues has yielded successful in-sample learning methods, which avoid OOD actions altogether. However, we argue that the conservatism inherent in this approach could impose unnecessary limitations. This study demonstrates that moderate generalization can be both reliable and beneficial for improving performance. Building on this insight, this paper introduces the offline multi-agent reinforcement learning algorithm with global moderate generalization (OMGMG). OMGMG enforces moderate generalization at the global level and dynamically distributes the generalization effects to individual agents through value decomposition techniques, thereby achieving macro-level control over the generalization process. OMGMG comprises two core components: global moderate action generalization and global moderate generalization propagation. The former approach improves value function estimation by selecting joint actions within the vicinity of the dataset. The latter approach ensures the effective propagation of reinforcement learning signals while mitigating the issue of erroneous generalization propagation during the bootstrapping process. Extensive experimental evaluations on the multi-agent Mujoco and StarCraft II benchmark demonstrate that our OMGMG surpasses the current state-of-the-art offline MARL methods across the majority of tasks.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1055
Loading