Agent-Centric Actor-Critic for Asynchronous Multi-Agent Reinforcement Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
TL;DR: We propose ACAC, an algorithm designed to enhance learning efficiency in asynchronous MARL by eliminating padding, leading to faster convergence and improved performance.
Abstract: Multi-Agent Reinforcement Learning (MARL) struggles with coordination in sparse reward environments. Macro-actions —sequences of actions executed as single decisions— facilitate long-term planning but introduce asynchrony, complicating Centralized Training with Decentralized Execution (CTDE). Existing CTDE methods use padding to handle asynchrony, risking misaligned asynchronous experiences and spurious correlations. We propose the Agent-Centric Actor-Critic (ACAC) algorithm to manage asynchrony without padding. ACAC uses agent-centric encoders for independent trajectory processing, with an attention-based aggregation module integrating these histories into a centralized critic for improved temporal abstractions. The proposed structure is trained via a PPO-based algorithm with a modified Generalized Advantage Estimation for asynchronous environments. Experiments show ACAC accelerates convergence and enhances performance over baselines in complex MARL tasks.
Lay Summary: In many real-world tasks, multiple intelligent agents need to work together to achieve a shared goal—like preparing a meal in a game such as Overcooked. These agents only receive occasional signals about how well they’re doing, which makes it difficult to learn effective teamwork. To speed up learning, they can use “macro-actions”—longer, high-level plans like “go to the tomato”—instead of taking one small step at a time. But since different agents take different amounts of time to complete these actions, they can fall out of sync, making coordination and learning much harder. Existing methods try to handle this by filling in missing information when an agent is busy. However, this can create confusing or misleading training data. Our method, called Agent-Centric Actor-Critic (ACAC), avoids this problem. Each agent records its own actions and timing independently. During training, a central module integrates these timelines in a coordinated way—without needing any artificial filler—while each agent still learns to act independently. Experiments show that ACAC leads to faster learning and better coordination—especially when rewards are rare and teamwork is essential.
Primary Area: Reinforcement Learning->Multi-agent
Keywords: Multi-Agent Reinforcement Learning, Asynchronous Multi-Agent Reinforcement Learning, MacDec-POMDP
Submission Number: 8327
Loading