Action-Conditioned Transformers for Decentralized Multi-Agent World Models

ICLR 2026 Conference Submission24931 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Reinforcement Learning, Reinforcement Learning, Contrastive Learning, World Model
TL;DR: A decentralized transformer world model for multi-agent RL that couples Perceiver global context with action-conditioned contrastive prediction, yielding coherent long-horizon rollouts and stronger teammate coordination.
Abstract: Multi-agent reinforcement learning (MARL) has achieved strong results on large-scale decision making, yet most methods are model-free, limiting sample efficiency and stability under non-stationary teammates. Model-based reinforcement learning (MBRL) can reduce data usage, but planning and search scale poorly with joint action spaces. We adopt a world model approach to long-horizon coordination while avoiding expensive planning. We introduce MACT, a decentralized transformer world model with linear complexity in the number of agents. Each agent processes discretized observation–action tokens with a shared transformer, while a single cross-agent Perceiver step provides global context under centralized training and decentralized execution. MACT achieves long-horizon coordination by coupling the Perceiver-derived global context with an action-conditioned contrastive objective that predicts future latent spaces several steps ahead given the planned joint action window and binding team actions to their multi-step dynamics. It produces consistent long-horizon rollouts and stronger team-level coordination. Experiments on the StarCraft Multi-Agent Challenge (SMAC) show that MACT surpasses strong model-free baselines and prior world model variants on most tested maps, with pronounced gains on coordination-heavy scenarios.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 24931
Loading