Dantzig-Wolfe Decomposition and Deep Reinforcement Learning

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Column Generation, Reinforcement Learning, MILP, bin-packing, combinatorial, optimisation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Combining classic optimisation decomposition methods to RL formulations of combinatorial optimisation problems such as the bin-packing problem to reduce action space size and episode lengths.
Abstract: The 3D bin packing problem is an NP-hard optimisation problem. RL solutions found in the literature tackle simplified versions of the full problem due to its large action space and long episode lengths. This work uses a Danzig-Wolfe formulation to decompose the full problem into a set partition and 3D knapsack problem. The RL agent is used to solve the 3D knapsack problem and CPLEX (a mixed integer linear programming solver) is used to solve the set partition problem. This removes the bin selection action from the action space of the agent and reduces the episode length to be only the number of items required to fill 1 bin rather than all items in the inference. We thereby simplify the learning problem compared to the full 3D bin-packing case. The trained agent is used at inference time to iteratively generate columns of the Danzig-Wolfe formulation using the column generation procedure. This algorithm provided improved solutions on up to 28/47 instances compared to those obtained by successively applying the RL agent to optimize volume occupation in a bin with the remaining items. RL solutions alone cannot provide valid lower bounds for solutions. This work also uses the Danzig-Wolfe formulation and column generation to improve on existing SOTA lower bounds by replacing the RL agent with an integer linear program for the 3D knapsack problem. An improved lower bound compared to SOTA was found on 17/47 instances by using CPLEX to solve both master and sub-problems.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7192
Loading