Offline Reward Inference on Graph: A New Thinking

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Offline reinforcement learning, Reward learning, Graph
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In offline reinforcement learning, reward inference is the key to learning effective policies in practical scenarios. Due to the expensive or unethical nature of environmental interactions in domains such as healthcare and robotics, reward functions are rarely accessible, and the task of inferring rewards becomes challenging. To address this issue, our research focuses on developing a reward inference method that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. Initially, we leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks substantiate the efficacy of our approach, wherein the utilization of our inferred rewards yields substantial performance enhancements within the offline reinforcement learning framework, particularly when confronted with limited reward annotations.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6978
Loading