Action-Conditioned Transformers for Decentralized Multi-Agent World Models

16 Apr 2026 (modified: 12 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multi-agent reinforcement learning (MARL) has achieved strong results on large-scale decision making, yet most methods are model-free, limiting sample efficiency and making coordination harder as teammates’ policies evolve during training. Model-based reinforcement learning (MBRL) can reduce data usage, but planning and search scale poorly with joint action spaces. We adopt a world model approach to long-horizon coordination while avoiding expensive planning. We introduce MACT, a decentralized transformer world model with linear complexity in the number of agents. Each agent processes discretized observation–action tokens with a shared transformer, while a single cross-agent Perceiver step provides global context under centralized training and decentralized execution. MACT targets long-horizon coordination by coupling Perceiver-derived global context with an action-conditioned contrastive objective that predicts future latent representations over a short horizon conditioned on planned actions. Experiments on the StarCraft Multi-Agent Challenge (SMAC) under tight data budgets show that MACT is competitive with strong model-free baselines and prior world-model variants, with larger gains on coordination-heavy scenarios.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Sebastian_Trimpe1
Submission Number: 8465
Loading