Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation

Mingfeng Fan; Jianan Zhou; Yifeng Zhang; Yaoxin Wu; Jinbiao Chen; Guillaume Adrien Sartoretti

Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation

Mingfeng Fan, Jianan Zhou, Yifeng Zhang, Yaoxin Wu, Jinbiao Chen, Guillaume Adrien Sartoretti

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-objective combinatorial optimization; preference optimization; conditional computation

Abstract: Recent deep reinforcement learning methods have achieved remarkable success in solving multi-objective combinatorial optimization problems (MOCOPs) by decomposing them into multiple subproblems, each associated with a specific weight vector. However, these methods typically treat all subproblems equally and solve them using a single model, hindering the effective exploration of the solution space and thus leading to suboptimal performance. To overcome the limitation, we propose POCCO, a novel plug-and-play framework that enables adaptive selection of model structures for subproblems, which are subsequently optimized based on preference signals rather than explicit reward values. Specifically, POCCO integrates a conditional computation block into the decoder, where a sparse gating network dynamically routes each subproblem through either a subset of feed-forward (FF) experts or a parameter-free identity (ID) expert. This enables context-aware selection of computation paths, effectively scaling model capacity and enhancing representation learning. Moreover, POCCO replaces raw scalarized rewards with pairwise preference learning: for each subproblem, the policy samples two trajectories, identifies the preferred one, and optimizes a Bradley–Terry likelihood based on their average log-likelihoods. This comparative feedback guides learning toward more preferred solutions, promoting efficient exploration and faster convergence. We integrate POCCO into two state-of-the-art neural MOCOP solvers—CNH and WE-CA—yielding POCCO-C and POCCO-W, respectively. As shown in Table 1, POCCO-W consistently outperforms WE-CA across all benchmarks, setting a new state-of-the-art among neural MOCOP methods. Similarly, POCCO-C surpasses CNH in every case, demonstrating its clear advantage.

Submission Number: 379

Loading