everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Value factorization has become widely used to design high-quality and scalable multi-agent reinforcement learning algorithms. However, existing methods assume agents execute synchronously, which does not align with the asynchronous nature of real-world multi-agent systems. In these systems, agents often make decisions at different times, executing asynchronous (macro-)actions characterized by varying and unknown duration. Our work introduces value factorization to the asynchronous framework. To this end, we formalize the consistency requirement between joint and individual macro-action selection, proving it generalizes the synchronous case. We then propose approaches that use asynchronous centralized information to enable factorization architectures to support macro-actions. We evaluate the resultant asynchronous value factorization algorithms across increasingly complex domains that are standard benchmarks in the macro-action literature. Crucially, the proposed methods scale well in these challenging coordination tasks where their synchronous counterparts fail.