TL;DR: We develop a state-of-the-art, memory efficient and scalable cooperative multi-agent reinforcement learning algorithm that leverages retention.
Abstract: As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modelling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how **Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested)**. Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable's performance gains and confirm its efficient computational memory usage. **All experimental data, hyperparameters, and code for a frozen version of Sable used in this paper are available on our website:** https://sites.google.com/view/sable-marl. **An improved and maintained version of Sable is available in Mava:** https://github.com/instadeepai/Mava.
Lay Summary: In Cooperative Multi-Agent Reinforcement Learning (MARL), we train a team of agents to work together toward a shared goal. This is essential for complex real-world systems such as in robot warehouses, where a group of robots must navigate shared spaces efficiently. However, as the number of agents grows and tracking historical information becomes increasingly important, conventional methods often struggle due to rising computational costs and the complexity of maintaining effective coordination.
In our work, we introduce **Sable**, a novel sequence modeling algorithm for MARL, designed specifically to handle environments that require temporal memory and a large number of agents, while remaining computationally efficient. Unlike attention-based methods, which struggle to retain historical information and become increasingly computationally costly as the number of agents grows, Sable uses a retention-based mechanism. This mechanism allows Sable to efficiently process long sequences and scale to environments with hundreds of agents, all while maintaining strong performance and manageable computational cost.
**Our extensive experiments across 45 diverse tasks show that Sable significantly outperforms existing state-of-the-art MARL algorithms.** It remains effective even as the number of agents scales into the hundreds, something that has remained out of reach for many methods. By expanding the capacity of multi-agent memory without sacrificing computational efficiency, Sable opens the door to more intelligent and scalable AI systems for real-world cooperative challenges.
Link To Code: https://sites.google.com/view/sable-marl
Primary Area: Reinforcement Learning->Multi-agent
Keywords: Multi-agent reinforcement learning, Reinforcement Learning, Sequence modeling, linear recurrent models, Retentive Networks
Submission Number: 9652
Loading