Action-Conditioned Transformers for Decentralized Multi-Agent World Models

Victor Augusto Kich; Junior Costa De Jesus; Jun Morimoto

Action-Conditioned Transformers for Decentralized Multi-Agent World Models

Victor Augusto Kich, Junior Costa De Jesus, Jun Morimoto

20 Sept 2025 (modified: 28 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Reinforcement Learning, Reinforcement Learning, Contrastive Learning, World Model

TL;DR: A decentralized transformer world model for multi-agent RL that couples Perceiver global context with action-conditioned contrastive prediction, yielding coherent long-horizon rollouts and stronger teammate coordination.

Abstract: Multi-agent reinforcement learning (MARL) has achieved strong results on large-scale decision making, yet most methods are model-free, limiting sample efficiency and stability under non-stationary teammates. Model-based reinforcement learning (MBRL) can reduce data usage, but planning and search scale poorly with joint action spaces. We adopt a world model approach to long-horizon coordination while avoiding expensive planning. We introduce MACT, a decentralized transformer world model with linear complexity in the number of agents. Each agent processes discretized observation–action tokens with a shared transformer, while a single cross-agent Perceiver step provides global context under centralized training and decentralized execution. MACT achieves long-horizon coordination by coupling the Perceiver-derived global context with an action-conditioned contrastive objective that predicts future latent spaces several steps ahead given the planned joint action window and binding team actions to their multi-step dynamics. It produces consistent long-horizon rollouts and stronger team-level coordination. Experiments on the StarCraft Multi-Agent Challenge (SMAC) show that MACT surpasses strong model-free baselines and prior world model variants on most tested maps, with pronounced gains on coordination-heavy scenarios.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 24931

Loading