A multi-head adaptive actor-critic algorithm for solving vehicle routing problems

Dawen Xia, Youlong Jin, Mingyue Huang, Yang Hu, Yujia Huo, Ziqiang Wang, Fujian Feng, Yantao Li, Huaqing Li

Published: 2025, Last Modified: 04 Nov 2025Appl. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Vehicle routing problems (VRPs) pose significant challenges in intelligent transportation systems (ITSs) and play a crucial role in traffic routing planning. Deep reinforcement learning (DRL) models based on encoder-decoder structures have demonstrated considerable potential for VRPs applications. However, the encoder-decoder structure based on the traditional actor-critic (AC) algorithm is limited by its unsatisfactory adaptive capability of environmental information, static generation of control regulation parameters, and single decoding strategy, resulting in the low solving capability of DRL in VRPs. To address the aforementioned problems, a multi-head adaptive actor-critic (MHAAC) algorithm is put forward and then integrated into an end-to-end deep reinforcement learning framework. The proposed algorithm integrates a multi-head attention mechanism and a dynamic parameter generation strategy for the environment, which significantly enhances the information processing capability and the adaptability to the environment. Specifically, we introduce an additive attention layer between the encoder and decoder structure to generate dynamic contextual vectors and extract fine-grained feature embeddings. We then design a multi-head actor network to solve the solution construction process and regulate the actor network through the adaptive mechanism of the critic network. Furthermore, we put forward a hybrid solution search algorithm (HS) that integrates several traditional search methods, enhancing the quality of the solutions and optimizing the parameters of the whole framework using a gradient strategy. Finally, empirical evaluations on four standard datasets with 10, 20, 50, and 100 customer nodes demonstrate that MHAAC outperforms existing specialized solvers and other DRL methods in both solution quality and efficiency.

External IDs:dblp:journals/apin/XiaJHHHWFLL25