Learning to Escape: Multi-mode Policy Learning for the Traveling Salesmen Problem

Myoung Hoon Ha; Seung-geun Chi; Sang Wan Lee

Learning to Escape: Multi-mode Policy Learning for the Traveling Salesmen Problem

Myoung Hoon Ha, Seung-geun Chi, Sang Wan Lee

Published: 01 Jan 2024, Last Modified: 27 Sept 2024EAIS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The traveling salesmen problem (TSP)-one of the most fundamental NP-hard problems in combinatorial optimization-has received considerable attention owing to its direct applicability to real-world routing. Recent studies on TSP have adopted a deep policy network to learn a stochastic acceptance rule. Despite its success in some cases, the structural and functional complexity of the deep policy networks makes it hard to explore the problem space while performing a local search at the same time. We found in our empirical analyses that searching processes are often stuck in the local region, leading to severe performance degradation. To tackle this issue, we propose a novel method for multi-mode policy learning. In the proposed method, a conventional exploration-exploitation scheme is reformulated as the problem of learning to escape from a local search area to induce exploration. We present a multi-mode Markov decision process, followed by policy and value design for local search and escaping modes. Experimental results show that the performance of the proposed method is superior to that of various baseline models, suggesting that the learned escaping policy allows the model to initiate a new local search in promising regions efficiently.

Loading