Top-two ListMLE Reinforcement Learning Based UGVs Formation Control with Changeable Pattern

Chengzhi Lei, Qiang Xiao, Zhigang Zeng

Published: 30 Apr 2025, Last Modified: 01 Jun 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: In this paper, we propose a method to achieve reinforcement learning based with a changeable pattern using a top-two selection mechanism. The selection mechanism functions as a method that fixes the number of utilized neighbors by training the scoring network through ListMLE loss computation between the top-two candidates derived from limited neighbor information scoring and the top-two estimates based on global value network. With fixed neighbor utilization, formation errors and other neighbor-related information can be directly fed into the policy network without structural modifications. Unlike other approaches that embed formation constraints in value networks, our method directly inputs formation error into the policy network, enabling formation control with changeable pattern. In the experiments, we further propose a training data processing method to handle varying numbers of neighbors across batches, where neighbor sequences are vectorized by masking techniques improving training efficiency.