COFormer: Towards a Foundation Model for Solving Combinatorial Optimization Problems

Xiaochen Wei; Zefang Zong; Guozhen Zhang; Chen Gao; Huandong Wang; Yong Li; Sheng-Jun Huang

COFormer: Towards a Foundation Model for Solving Combinatorial Optimization Problems

Xiaochen Wei, Zefang Zong, Guozhen Zhang, Chen Gao, Huandong Wang, Yong Li, Sheng-Jun Huang

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Foundation Models, Next-token Prediction, combinatorial optimizations

TL;DR: COFormer is a unified transformer-based foundation model that solves diverse combinatorial optimization problems with one architecture and parameter set.

Abstract: Combinatorial Optimization Problems (COP) encompasses a wide range of real-world scenarios. While learning-based methods have achieved notable success on specialized COPs, the development of a unified architecture capable of solving diverse COPs with a single set of parameters remains an open challenge. In this work, we present COFormer, a novel framework that offers significant gains in both efficiency and practicality. Drawing inspiration from the success of next-token prediction in sequence modeling, we formulate the solution process of each COP as a Markov Decision Process (MDP), convert the resulting sequential trajectories into tokenized sequences, and train a transformer-based model on this data. To mitigate the long sequence lengths inherent in trajectory representations, we introduce a CO-prefix design that compactly encodes static problem features. Furthermore, to handle the heterogeneity between state and action tokens within the MDP, we adopt a three-stage learning strategy: first, a dynamic prediction model is pretrained via imitation learning; this model then serves as the foundation for policy generation and is subsequently fine-tuned using reinforcement learning. Extensive experiments across eight distinct COPs and various scales demonstrate COFormer’s remarkable versatility, emphasizing its ability to generalize to new, unseen problems with minimal fine-tuning, achieving even few-shot or zero-shot performance. Our approach provides a valuable complement to existing neural methods for COPs that focus on optimizing performance for individual problems.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 15679

Loading