Conditional Policy Generator for Dynamic Constraint Satisfaction and Optimization

Conditional Policy Generator for Dynamic Constraint Satisfaction and Optimization

TMLR Paper5409 Authors

17 Jul 2025 (modified: 18 Sept 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Leveraging machine learning methods to solve constraint satisfaction problems has shown promising, but they are mostly limited to a static situation where the problem description is completely known and fixed from the beginning. In this work we present a new approach to constraint satisfaction and optimization in dynamically changing environments, particularly when variables in the problem are statistically independent. We frame it as a reinforcement learning problem and introduce a conditional policy generator by borrowing the idea of class conditional generative adversarial networks (GANs). Assuming that the problem includes both static and dynamic constraints, the former are used in a reward formulation to guide the policy training such that it learns to map to a probabilistic distribution of solutions satisfying static constraints from a noise prior, which is similar to a generator in GANs. On the other hand, dynamic constraints in the problem are encoded to different class labels and fed with the input noise. The policy is then simultaneously updated for maximum likelihood of correctly classifying given the dynamic conditions in a supervised manner. We empirically demonstrate a proof-of-principle experiment with a multi-modal constraint satisfaction problem and compare between unconditional and conditional cases.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We would like to thank the reviewers for valuable comments and feedback. In the revised manuscript, we have made the following changes in general: - Included baseline comparison in Fig. 3 - Added performance metrics including success/violation counts in Fig. 3 - Changed training trajectories as a function of the number of samples in Figs. 2, 4, 5. - Included boundary-confusion analysis in Fig. 6 - Added a section on limitations and future work. - Updated the reference list. - Used a different notation for the solution subregion. Details of the revision are provided in the responses to each reviewer’s comments.

Assigned Action Editor: ~Emmanuel_Bengio1

Submission Number: 5409

Loading