Multi-Agent Decision S4: Leveraging State Space Models for Offline Multi-Agent Reinforcement Learning

Ashmita Bhattacharya; Malyaban Bal

Multi-Agent Decision S4: Leveraging State Space Models for Offline Multi-Agent Reinforcement Learning

Ashmita Bhattacharya, Malyaban Bal

Published: 25 Feb 2025, Last Modified: 25 Feb 2025MARW at AAAI 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, offline reinforcement learning, multi-agent reinforcement learning, sequence to sequence supervised learning

TL;DR: This work proposes a sequence-to-sequence supervised learning based offline multi-agent reinforcement learning algorithm utilizing the benefits of Structured State Space Sequence (S4) models with scalable and minimal communication.

Abstract: Goal-conditioned sequence-based supervised learning with transformers has shown promise in offline reinforcement learning (RL) for single-agent settings. However, extending these methods to offline multi-agent RL (MARL) remains challenging. Existing transformer-based MARL approaches either train agents independently, neglecting multi-agent system dynamics, or rely on centralized transformer models, which face scalability issues. Moreover, transformers inherently struggle with long-term dependencies and computational efficiency. Building on the recent success of Structured State Space Sequence (S4) models, known for their parameter efficiency, faster inference, and superior handling of long context lengths, we propose a novel application of S4-based models to offline MARL tasks. Our method utilizes S4’s efficient convolutional view for offline training and its recurrent dynamics for fast on-policy fine-tuning. To foster scalable cooperation between agents, we sequentially expand the decision-making process, allowing agents to act one after another at each time step. This design promotes bi-directional cooperation, enabling agents to share information via their S4 latent states or memory with minimal communication. Gradients also flow backward through this shared information, linking the current agent’s learning to its predecessor. Experiments on challenging MARL benchmarks, including Multi-Robot Warehouse (RWARE) and StarCraft Multi-Agent Challenge (SMAC), demonstrate that our approach significantly outperforms state-of-the-art offline RL and transformer-based MARL baselines across most tasks.

Submission Number: 22

Loading