Keywords: credit assignment, delayed rewards, noised rewards, Preferencece-based Reinforcement Learning, treatment of sepsis
Abstract: Credit assignment has been utilized as a common technique for determining the past key state-action pairs and assigning corresponding rewards that have strong relevance with the final outputs in reinforcement learning, especially for environments with delayed rewards. However, current reward function design methods rely heavily on domain knowledge and may not accurately reflect the actual reward that should be received, which will lead to noised reward assignments during the credit assignment process and deteriorate the performance of the agent. To address this issue, in this paper, by leveraging the benefits of Preference-based Reinforcement Learning (PbRL), we propose a novel trajectory preference-based credit assignment method, where each trajectory is assigned to one of three different preferences according to its related delayed reward and the entire trajectory space. Then, a causal Transformer framework is introduced to predict the relevance between the decisions at each timestep and the different trajectory preferences to guide the credit assignment. Despite the unavoidable noised reward related to each trajectory, we demonstrate that our method can still effectively guide agents to learn superior strategies. Experiments on the Mujoco task and the treatment of sepsis under extremely delayed reward setting show that our method can mitigate the adverse effects resulting from the delayed noised rewards and provide effective guidelines for agents.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10816
Loading