Neural Combinatorial Optimization with Reinforcement Learning

Irwan Bello; Hieu Pham; Quoc Le; Mohammad Norouzi; Samy Bengio

Neural Combinatorial Optimization with Reinforcement Learning

Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio (privately revealed to you)

12 Jul 2025 (modified: 22 Jun 2025)Submitted to ICLR 2017Readers: Everyone

TL;DR: neural combinatorial optimization, reinforcement learning

Abstract: We present a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent neural network that, given a set of city \mbox{coordinates}, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent neural network using a policy gradient method. Without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to $100$ nodes. These results, albeit still quite far from state-of-the-art, give insights into how neural networks can be used as a general tool for tackling combinatorial optimization problems.

Conflicts: google.com

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 7 code implementations](https://www.catalyzex.com/paper/neural-combinatorial-optimization-with/code)

3 Replies

Loading

Neural Combinatorial Optimization with Reinforcement Learning

Irwan Bello*, Hieu Pham*, Quoc V. Le, Mohammad Norouzi, Samy Bengio (privately revealed to you)

Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio (privately revealed to you)