Constrained Reinforcement Learning using Bender’s Decomposition and Exact Constraint Satisfaction

17 Sept 2025 (modified: 25 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Constrained Reinforcement Learning, Optimization
TL;DR: We propose a novel Constrained Reinforcement Learning technique that guarantees hard constraints are upheld by using an implicit parametrization based on Bender's decomposition.
Abstract: Recent advancements in reinforcement learning (RL) have expanded its applications beyond sequential decision-making to encompass non-sequential tasks, such as matrix decompositions, automatic generation of sorting networks, and combinatorial optimization. However, these tasks often require problem-specific algorithm designs to ensure the validity of the solution. To address this limitation, we propose a universal framework that reformulates non-sequential tasks as constrained RL problems by learning to generate cutting planes, i.e., mathematical constraints that systematically refine the solution space. We ensure constraint satisfaction throughout the training process, enabling safe and efficient training even during deployment. We show the efficacy of our framework on two complex optimization problems: a reward-maximizing stochastic job-shop scheduling problem and a nonlinear, nonconvex packing problem. Our method achieves near-globally optimal solutions while accelerating convergence by up to a factor of 800.
Primary Area: reinforcement learning
Submission Number: 8806
Loading