Learning to Branch with Offline Reinforcement Learning

Shengyu Feng; Yiming Yang

Learning to Branch with Offline Reinforcement Learning

Shengyu Feng, Yiming Yang

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Mixed Integer Linear Programming, Combinatorial Optimization, Branch-and-Bound, Offline Reinforcement Learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a novel offline RL algorithm for neural branching in branch-and-bound algorithm in mixed integer programming.

Abstract: Mixed Integer Linear Program (MILP) solvers are mostly built upon a branch-and-bound (B\&B) algorithm, where the efficiency of traditional solvers heavily depends on hand-craft heuristics for branching. Such a dependency significantly limits the success of those solvers because such heuristics are often difficult to obtain, and not easy to generalize across domains/problems. Recent deep learning approaches aim to automatically learn the branching strategies in a data-driven manner, which removes the dependency on hand-crafted heuristics but introduces a dependency on the availability of high-quality training data. Obtaining the training data that demonstrates near-optimal branching strategies can be a difficult task itself, especially for large problems where accurate solvers have a hard time scaling and producing near-optimal demonstrations. This paper overcomes this obstacle by proposing a new offline reinforcement learning (RL) approach, namely the \textit{Ranking-Constrained Actor-Critic} algorithm, which can efficiently learn good branching strategies from sub-optimal or inadequate training signals. Our experiments show its advanced performance in both prediction accuracy and computational efficiency over previous methods for different types of MILP problems on multiple evaluation benchmarks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4761

Loading