Abstract: Multi-agent path finding (MAPF) is one of the core challenges in multi-agent systems. Due to its NP-hard nature, agents are prone to getting stuck in local optima. Meanwhile, existing distributed planning methods often require frequent communication, making them unsuitable for communication-constrained scenarios. To address these challenges, this paper proposes C2Q, a multi-agent reinforcement learning algorithm that integrates Conflict-Based Search (CBS) intervention guidance and curiosity-driven exploration. Built upon the QMIX framework, C2Q introduces a dynamic intervention mechanism, which monitors agents’ recent rewards and adaptively triggers CBS guidance to help agents escape local optima. Additionally, a curiosity-driven mechanism is incorporated to generate intrinsic rewards, encouraging exploration of unknown environments and reducing ineffective interactions. Experimental results demonstrate that in structured 20 × 20 and 30 × 30 warehouse maps, C2Q achieves higher success rates and better path quality compared to other non-communication-based methods. In scenarios with fewer than 32 agents, C2Q achieves a success rate above 90%, performing comparably to the DCC algorithm, which leverages selective communication. Furthermore, ablation studies validate the effectiveness of both the intervention guidance and curiosity-driven modules.
External IDs:dblp:conf/icic/GuWZCFL25
Loading