Leveraging Pre-Trained Tacit Model for Efficient Multi-Agent Coordination

17 Sept 2025 (modified: 20 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-agent reinforcement learning, tacit pre-training, exploration efficiency
Abstract: Exploration inefficiency caused by large policy spaces is a common challenge in multi-agent reinforcement learning. Although incorporating prior knowledge has been demonstrated to improve exploration efficiency, existing methods typically model it as intrinsic rewards, which may violate potential-based conditions, leading to policy deviation and hindering optimal policy learning. To address this, we propose a novel two-phase multi-agent learning framework, **PTMC** (**P**re-training **T**acit **M**odel for efficient **C**oordination), comprising pre-training and coordinated training phases. In the pre-training phase, PTMC conducts decentralized agent training by integrating general prior knowledge through tacit rewards, while enhancing model scalability by masking opponent information. During the coordinated training phase, coordinated policy is initialized as the pre-trained tacit model, and a tacit constraint term is incorporated into the optimization objective to preserve advantageous tacit behaviors while enabling task-specific adaptation. It is worth emphasizing that the pre-training phase of PTMC is highly efficient, constituting only a minor fraction of the total training time compared to the coordinated training. Experimental results demonstrate that our approach significantly outperforms state-of-the-art baselines in terms of both coordinated performance and exploration efficiency.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 8247
Loading