Keywords: Large Model, Combinatorial Optimization, Transformers
TL;DR: We explore whether a unified model with one single architecture and parameter set can be developed to solve diverse combinatorial optimization problems.
Abstract: Combinatorial Optimization (CO) covers a wide range of problems that exist in many real-world scenarios, while solving them using learning based methods has drawn great attention. Developing a unified deep model to solve diverse CO problems has many benefits, including a reduction in the need for hand-crafted designs for individual problems and enhanced flexibility for few-shot learning in unseen problem types. Meanwhile, a unified model with a single architecture and parameter set for diverse CO problems remains absent. To the best of our knowledge, we are the first to formally investigate and develop such a unified model. Motivated by the success of the next-token-prediction concept, we formulate each solution into an Markov Decision Process, and train the model with transformer backbone using tokenized data collected from problem solution trajectories. However, directly training the unified model is challenging due to the long token length of the trajectories, which arises from the complex observation space of CO problems, resulting from their NP-hard nature. Furthermore, using the same model to simultaneously predict observations and actions—distinct types of elements within a trajectory—further increases training difficulty. To address these challenges, we introduce two key designs. First, to reduce token length, we implement a CO-prefix design that aggregates the static features of the problems. Second, to account for the heterogeneity of state and action tokens within the MDP, we adopt a two-stage self-supervised learning scheme. In the first stage, a dynamic prediction model is learned, which then serves as a pre-trained model for subsequent policy generation. Experiments across a set of nine problems demonstrate the robust problem-solving capabilities of our unified model, along with its few-shot and even zero-shot generalization abilities. We believe our framework provides a valuable complement to existing neural CO methods that focus on achieving optimal performance for individual CO problems.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3569
Loading