Towards Noise‐Robust Multi‐Agent Imitation Learning via Global Credit Sequence Decoding

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robust Reinforcement Learning, Multi‐Agent Learning, Imitation Learning, Credit Assignment
Abstract: Multi-Agent Reinforcement Learning (MARL) has emerged as a promising approach to solving complex decision-making problems such as multi-agent collaboration. To avoid the difficulty of designing complex reward functions, researchers increasingly adopt imitation learning. Classical methods extend single-agent imitation learning to multi-agent settings by matching distributions from expert demonstrations. However, noisy or low-quality trajectories within these demonstrations can mislead joint policy optimization, leading to significant performance degradation. This study introduces a sequential autoregressive architecture that models global dependencies among agents, facilitating adaptive credit assignment and policy optimization. The architecture theoretically enhances the variance of joint advantages and rewards, addressing issues like vanishing gradients and mode collapse caused by noisy demonstrations. Experiments show that our method achieves a significant performance improvement on multiple benchmarks.
Supplementary Material: pdf
Primary Area: reinforcement learning
Submission Number: 23925
Loading