Multi-Action Sampling with Deep Reinforcement Learning for Traveling Salesman Problem

Wei Liu; Thomas Bäck; Yingjie Fan

Multi-Action Sampling with Deep Reinforcement Learning for Traveling Salesman Problem

Wei Liu, Thomas Bäck, Yingjie Fan

Published: 04 Apr 2025, Last Modified: 09 Jun 2025LION19 2025EveryoneRevisionsBibTeXCC BY 4.0

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Tracks: Main Track

Keywords: Vehicle Routing;Deep Reinforcement Learning;Multi-action Sampling;Neural Networks

Abstract: The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization challenge with significant practical applications in logistics and transportation. Recent advancements in Deep Reinforcement Learning (DRL), have shifted the focus from traditional heuristic methods to data-driven approaches for solving TSP. Although DRL-based solutions offer promising results, they often struggle to match the performance of classical heuristics in terms of computational efficiency and solution quality. To further improve efficiency, this paper introduces a novel multi-action sampling strategy that enhances the Learn-to-Improve (L2I) framework for TSP. The proposed approach improves solution quality by averaging rewards over multiple actions during training, which mitigates bias and promotes more effective exploration. During inference, multi-action sampling is applied in later stages to explore alternative solution in a novel parallel way, mitigating inadequate convergence. Experimental results demonstrate that the proposed method outperforms existing L2I approaches and achieves near-optimal solutions with competitive computational efficiency.

Submission Number: 63

Loading