Abstract: This letter investigates the target search problem for a network of autonomous vehicles, aiming to maximize the detection of randomly appearing targets within a given area. Considering no prior knowledge of the targets is available, we propose a multi-vehicle cooperative persistent coverage scheme under the framework of multi-agent reinforcement learning, in contrast to heuristic and model-based optimization methods in existing works. We model the persistent coverage problem as a partially observable Markov decision process (POMDP) due to the vehicles' limited observation ranges, and introduce a knowability map to characterize their knowledge of the target area. Each vehicle employs a distributed estimator, leveraging its own observations and shared information from neighboring vehicles, to construct a globally estimated knowability map—thereby mitigating partial observability. The persistent coverage policies are learned with the architecture of centralized training and distributed execution, enabling cooperative and efficient target search by fully exploiting shared information. Moreover, we propose an adaptive partition method for the target area to ensure a fixed dimension of the state space in the POMDP, which can improve scalability of the learned policy to target areas with various sizes. Simulations validate effectiveness and scalability of the proposed cooperative scheme.
External IDs:dblp:journals/ral/LiLSSWW25
Loading