Keywords: RLHF, RL, Constraint Learning, Theoretical Analysis
TL;DR: We investigate constraint learning from theoretical perspective and provide a scalable algorithm
Abstract: Constraint learning from human feedback (CLHF) has garnered significant interest in the domain of safe reinforcement learning (RL) due to the challenges associated with designing constraints that elicit desired behaviors. However, a comprehensive theoretical analysis of CLHF is still missing. This paper addresses this gap by establishing a theoretical foundation. Concretely, trajectory-wise feedback, which is the most natural form of feedback, is shown to be helpful only for learning chance constraints. Building on this insight, we propose and theoretically analyze algorithms for CLHF and for solving chance constrained RL problems. Our algorithm is empirically shown to outperform an existing algorithm.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1759
Loading