Population-Based Reinforcement Learning for Combinatorial Optimization Problems

Nathan Grinsztajn; Daniel Furelos-Blanco; Thomas D Barrett

Population-Based Reinforcement Learning for Combinatorial Optimization Problems

Nathan Grinsztajn, Daniel Furelos-Blanco, Thomas D Barrett

Published: 01 Feb 2023, Last Modified: 22 Jun 2025Submitted to ICLR 2023Readers: Everyone

Keywords: reinforcement learning, combinatorial optimization, population

TL;DR: We present a population-based RL method for CO problems: the training procedure makes the agents complementary to maximize the population's performance.

Abstract: Applying reinforcement learning to combinatorial optimization problems is attractive as it obviates the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity, thus leading approaches are often augmented with additional search strategies, from stochastic sampling and beam-search to explicit fine-tuning. In this paper, we argue for the benefits of learning a population of complementary agents, which can be simultaneously rolled out at inference. To this end, we introduce Poppy, a simple theoretically grounded training procedure for populations. Instead of relying on a predefined or hand-crafted notion of diversity, Poppy induces an unsupervised specialization targeted solely at maximizing the performance of the whole population. We show that Poppy leads to a set of complementary heuristics, and obtain state-of-the-art results on three popular NP-hard problems: the traveling salesman (TSP), the capacitated vehicle routing (CVRP), and 0-1 knapsack (KP). On TSP specifically, Poppy divides by 5 the optimality gap while reducing the inference time by more than 10 compared to previous state-of-the-art reinforcement learning approaches.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/population-based-reinforcement-learning-for/code)

13 Replies

Loading