Improve QMIX from CBS Intervention Guide and Curiosity Mechanism for Multi-Agent Path Finding

Bin Gu, Quanjin Wang, Hanlin Zhu, Bingqian Chen, Shuo Feng, Shupan Li

Published: 2025, Last Modified: 04 Nov 2025ICIC (20) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-agent path finding (MAPF) is one of the core challenges in multi-agent systems. Due to its NP-hard nature, agents are prone to getting stuck in local optima. Meanwhile, existing distributed planning methods often require frequent communication, making them unsuitable for communication-constrained scenarios. To address these challenges, this paper proposes C2Q, a multi-agent reinforcement learning algorithm that integrates Conflict-Based Search (CBS) intervention guidance and curiosity-driven exploration. Built upon the QMIX framework, C2Q introduces a dynamic intervention mechanism, which monitors agents’ recent rewards and adaptively triggers CBS guidance to help agents escape local optima. Additionally, a curiosity-driven mechanism is incorporated to generate intrinsic rewards, encouraging exploration of unknown environments and reducing ineffective interactions. Experimental results demonstrate that in structured 20 × 20 and 30 × 30 warehouse maps, C2Q achieves higher success rates and better path quality compared to other non-communication-based methods. In scenarios with fewer than 32 agents, C2Q achieves a success rate above 90%, performing comparably to the DCC algorithm, which leverages selective communication. Furthermore, ablation studies validate the effectiveness of both the intervention guidance and curiosity-driven modules.

External IDs:dblp:conf/icic/GuWZCFL25