Keywords: Reward Shaping, Linear Temporal Logic
Abstract: There is a growing interest in using formal languages such as Linear Temporal Logic (LTL) to specify complex tasks and reward functions for reinforcement learning (RL) precisely and succinctly. Nevertheless, existing methods often assign sparse rewards, which may require millions of exploratory episodes to converge to a quality policy. To address this limitation, we adopt the notion of task progression to measure the degree to which a task specified by a co-safe LTL formula is partially completed and design several reward functions to incentivize a RL agent to satisfy the task specification as much as possible. We also develop an adaptive reward shaping approach that dynamically updates reward functions during the learning process. Experimental results on a range of benchmark RL environments demonstrate that the proposed approach generally outperforms baselines, achieving earlier convergence to a policy with a higher success rate of task completion and a higher normalized expected discounted return.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8524
Loading