Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization
Authorship Note: Leave this blank.
Keywords: Multi-Agent Reinforcement Learning, Multi-Agent Credit Assignment Problem, Policy Optimization, LEARN
TL;DR: We introduce GPAE, a per-agent n-step advantage estimator for MARL under CTDE that is policy-invariant and stabilizes off-policy sample reuse via double-truncated importance weights, improving credit assignment, coordination, and sample efficiency.
Abstract: In this paper, we propose a novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation. The core of our approach is Generalized Per-Agent Advantage Estimator (GPAE), which employs a per-agent value iteration operator to compute precise per-agent advantages. This operator enables stable off-policy learning by indirectly estimating values via action probabilities, eliminating the need for direct $Q$-function estimation. To further refine estimation, we introduce a double-truncated importance sampling ratio scheme. This scheme improves credit assignment for off-policy trajectories by balancing sensitivity to the agent’s own policy changes with robustness to non-stationarity from other agents. Experiments on benchmarks demonstrate that our approach outperforms existing approaches, excelling in coordination and sample efficiency for complex scenarios.
Area: Learning and Adaptation (LEARN)
Supplementary Material: zip
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1670
Loading