COMPASS: Training-Free Guidance for Skill Discovery with Human Feedback

COMPASS: Training-Free Guidance for Skill Discovery with Human Feedback

ICLR 2026 Conference Submission330 Authors

01 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Skill Discovery

Abstract: Unsupervised skill discovery (USD) aims to learn diverse behaviors without reward functions, but often results in task-irrelevant or hazardous behaviors due to uniform exploration. Guided skill discovery (GSD) addresses this issue by incorporating human intent to focus exploration on meaningful and safe regions. However, existing GSD methods typically rely on pre-defined rules, expert demonstrations, or training instruction models, which are either costly to obtain or ineffective with sparse human feedback. To tackle this, we identify a key insight: a semantically coherent skill latent space, where nearby embeddings correspond to behaviors with similar human desirability, enables training-free guidance from sparse feedback. Building on this insight, we propose COMPASS, a training-free GSD framework that ensures semantic coherence in the latent space. Exploiting the coherence of this latent space, COMPASS constructs a dense, training-free guidance signal in this latent space, eliminating the need for any model training beyond the skill policy itself. This guidance signal is then integrated into skill discovery objectives to direct exploration toward human-desirable regions. Theoretical analysis guarantees the reliability of our training-free guidance signal, and extensive experiments across diverse state-based and pixel-based tasks show that COMPASS learns diverse, human-aligned skills, avoids hazardous behaviors, and achieves superior downstream performance with minimal human feedback.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 330

Loading