MultiAgent-DeepQ: A Reinforcement Learning Framework for Multi-Agent Exploration in Unknown Environments

Dibyendu Ghosh, Devodita Chakravarty

Published: 2025, Last Modified: 01 Apr 2026ICAR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-agent exploration in unknown environments is a fundamental problem in robotics and autonomous systems, with applications in surveillance, search and rescue, and environmental monitoring. Traditional Coverage Path Planning (CPP) approaches often suffer from inefficient coordination, high collision rates, and redundant movements, particularly as the number of agents increases. This paper introduces MultiAgent-DeepQ, a reinforcement learning framework that leverages a multi-headed Deep Q-learning architecture to optimize coordination and exploration efficiency. The framework enables agents to dynamically adapt movement policies based on learned coverage patterns, while minimizing redundant revisits and avoiding collisions. The proposed method was evaluated across multiple environments of increasing complexity and compared against existing multi-agent reinforcement learning approaches. The framework achieved 100% coverage in all tested environments, eliminated collisions, and significantly reduced the total number of steps—yielding an average 10.5× improvement over the baseline. Furthermore, the method demonstrated strong scalability from 2 to 12 agents without performance degradation. These results highlight the robustness and efficiency of MultiAgent-DeepQ, making it a promising solution for large-scale multi-agent exploration tasks.

External IDs:dblp:conf/icar/GhoshC25