MASTARS: Multi-Agent Sequential Trajectory Augmentation with Return-Conditioned Subgoals

ICLR 2026 Conference Submission17646 Authors

19 Sept 2025 (modified: 28 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, multi-agent, data augmentation, diffusion
Abstract: The performance of offline reinforcement learning (RL) critically depends on the quality and diversity of the offline dataset. While diffusion-based data augmentation for offline RL has shown promise in single-agent settings, its extension to multi-agent systems poses challenges due to the combinatorial complexity of joint modeling and the lack of inter-agent coordination in independent generation. To overcome these issues, we introduce MASTARS, a novel diffusion-based framework that generates coordinated multi-agent trajectories through agent-wise sequential generation. MASTARS employs a diffusion inpainting mechanism, where each agent’s trajectory is generated based on the trajectories of previously sampled agents. This enables fine-grained coordination among agents while avoiding the complexity of high-dimensional joint modeling. To further improve sample quality, MASTARS incorporates return-conditioned subgoals, allowing it to leverage valuable data that might otherwise be discarded. This agent-wise, goal-conditioned approach produces realistic and harmonized multi-agent rollouts, facilitating more effective offline MARL training. Experiments on benchmark environments demonstrate that MASTARS significantly improves the performance of offline MARL algorithms.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 17646
Loading