Policy optimization in reinforcement learning for column generation

TMLR Paper2566 Authors

22 Apr 2024 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Column generation (CG) is essential for addressing large-scale linear integer programming problems in many industrial domains. While its importance is evident, the CG algorithms face convergence issues, and several heuristic algorithms have been developed to address these challenges. However, few machine learning and reinforcement learning methods are available that enhance the existing CG algorithm. This paper introduces a new policy optimization RL framework to improve the existing DQN-based CG framework, particularly training time, called \textbf{PPO-CG}. When applied to the Cutting Stock Problems (CSP), our approach requires merely \textbf{20\%} of the training time observed with the DQN-based method and only \textbf{35\%} in Vehicle Routing Problems with Time Windows (VRPTW). In addition, our approach suggests a novel method of node selection problem in the framework of reinforcement learning on graphs.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Thomy_Phan1
Submission Number: 2566
Loading