TreeDQN: Sample-Efficient Off-Policy Reinforcement Learning for Combinatorial Optimization

26 Sept 2024 (modified: 10 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, combinatorial optimization, branch-and-bound, ML4CO
TL;DR: We present a data-efficient off-policy reinforcement learning method to learn a branching heuristic for the Branch-and-Bound algorithm.
Abstract: A convenient approach to optimally solving combinatorial optimization tasks is Branch-and-Bound method. The branching heuristic in this method can be learned to solve a large set of similar tasks. The promising results here are achieved by the recently appeared on-policy reinforcement learning (RL) method based on the tree Markov Decision Process (tMDP). To overcome its main disadvantages, namely, very large training time and unstable training, we propose TreeDQN, a sample-efficient off-policy RL method that is trained by optimizing the geometric mean of expected return. To theoretically support the training procedure for our method, we prove the contraction property of the Bellman operator for the tree MDP. As a result, our method requires up to 10 times less training data, performs faster than known on-policy methods on synthetic tasks. Moreover, TreeDQN significantly outperforms the state-of-the-art techniques on a challenging practical task from the ML4CO competition.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6657
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview