NBDI: A Simple and Effective Termination Condition for Skill Extraction from Task-Agnostic Demonstrations

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Enhance policy learning in downstream tasks by learning terminated skills through state-action novelties.
Abstract: Intelligent agents are able to make decisions based on different levels of granularity and duration. Recent advances in skill learning enabled the agent to solve complex, long-horizon tasks by effectively guiding the agent in choosing appropriate skills. However, the practice of using fixed-length skills can easily result in skipping valuable decision points, which ultimately limits the potential for further exploration and faster policy learning. In this work, we propose to learn a simple and effective termination condition that identifies decision points through a state-action novelty module that leverages agent experience data. Our approach, Novelty-based Decision Point Identification (NBDI), outperforms previous baselines in complex, long-horizon tasks, and remains effective even in the presence of significant variations in the environment configurations of downstream tasks, highlighting the importance of decision point identification in skill learning.
Lay Summary: In reinforcement learning, skills (i.e., sequences of low-level actions) help agents solve long-horizon tasks. However, most prior methods rely on fixed-length skills, which can miss critical decision points, such as crossroads in navigation tasks. This limits exploration and slows down policy learning in downstream tasks. We introduce NBDI (Novelty-Based Decision point Identification), a simple yet effective method that uses state-action novelty to detect when a skill should terminate. By quantifying the novelty of a state-action pair, NBDI adaptively assigns variable-length skills using task-agnostic demonstrations, without requiring prior knowledge of the downstream task. Our method enables agents to make new decisions at meaningful points, improving their ability to generalize to new, more complex environments. Empirical results demonstrate that NBDI improves performance across diverse domains, including maze navigation and robotic manipulation, and maintains its effectiveness even under significant changes in the configuration of long-horizon downstream tasks.
Link To Code: https://github.com/ku-dmlab/NBDI
Primary Area: Reinforcement Learning
Keywords: Reinforcement Learning, Unsupervised Learning, Skill Learning
Submission Number: 5667
Loading