BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization

Zijun Liao; Jinbiao Chen; Debing Wang; Zizhen Zhang; Jiahai Wang

BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization

Zijun Liao, Jinbiao Chen, Debing Wang, Zizhen Zhang, Jiahai Wang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose preference optimization with best-anchored preference pairs and objective-guided loss, outperforming existing methods on NP-hard problems like JSP, TSP, and FJSP.

Abstract: Neural Combinatorial Optimization (NCO) has emerged as a promising approach for NP-hard problems. However, prevailing RL-based methods suffer from low sample efficiency due to sparse rewards and underused solutions. We propose *Best-anchored and Objective-guided Preference Optimization (BOPO)*, a training paradigm that leverages solution preferences via objective values. It introduces: (1) a best-anchored preference pair construction for better explore and exploit solutions, and (2) an objective-guided pairwise loss function that adaptively scales gradients via objective differences, removing reliance on reward models or reference policies. Experiments on Job-shop Scheduling Problem (JSP), Traveling Salesman Problem (TSP), and Flexible Job-shop Scheduling Problem (FJSP) show BOPO outperforms state-of-the-art neural methods, reducing optimality gaps impressively with efficient inference. BOPO is architecture-agnostic, enabling seamless integration with existing NCO models, and establishes preference optimization as a principled framework for combinatorial optimization.

Lay Summary: Complex planning tasks, like scheduling factory machines or mapping delivery routes, are incredibly hard and often take too long to solve perfectly. These challenges slow down businesses and waste resources. We created a new method called BOPO to tackle this. It teaches computers to compare different plans, learn from the best ones, and fine-tune solutions based on how good they are. Unlike older methods, BOPO works efficiently without needing extra complex systems. Our approach delivers faster, better plans for tasks like factory scheduling or city-to-city travel, outperforming other advanced tools. By sharing BOPO, we’re helping businesses save time and resources while making planning easier and more effective.

Link To Code: https://github.com/L-Z-7/BOPO

Primary Area: Optimization->Discrete and Combinatorial Optimization

Keywords: Neural Combinatorial Optimization; Preference Optimization; Machine Learning;

Submission Number: 9969

Loading