Learning variable-length skills through Novelty-based Decision Point Identification

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement Learning, Unsupervised Learning, Skill Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Accelerate policy learning in downstream tasks by learning variable-length skills through state-action novelties.
Abstract: Intelligent agents are able to make decisions based on different levels of granularity and duration. Recent advances in skill learning with data-driven behavior priors enabled the agent to solve complex, long-horizon tasks by effectively guiding the agent in choosing appropriate skills. However, the practice of using fixed-length skills can easily result in skipping valuable decision points, which ultimately limits the potential for further exploration and faster policy learning. For example, making a temporally-extended decision at a crossroad can offer more direct access to parts of the state space that would otherwise be challenging to reach. In this work, we propose to learn variable-length skills by identifying decision points through a state-action novelty module that leverages offline agent experience datasets, which turns out to be an efficient proxy for the critical decision point detection. We show that capturing critical decision points can further accelerate policy learning by enabling a more efficient exploration of the state space and facilitating transfer of knowledge across various tasks. Our approach, NBDI (Novelty-based Decision Point Identification), substantially outperforms previous baselines in complex, long-horizon tasks (e.g. robotic manipulation and maze navigation), which highlights the importance of decision point identification in skill learning.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2343
Loading