Unsupervised skill discovery aims to learn diverse and distinguishable behaviors in open-ended reinforcement learning. For the existing methods, they focus on improving the diversity via pure exploration, mutual information optimization and learning temporal representation. Despite they perform well on exploration, they remain limited in terms of efficiency, especially for the high-dimensional situations. In this work, we frame the skill discovery as a min-max game of skill generation and policy learning, proposing a regret-aware method on top of temporal representation learning that expands the discovered skill space along the direction of upgradable policy strength. The key insight behind the proposed method is that the skill discovery is adversarial to the policy learning, i.e., skills with weak strength should be further explored while less exploration for the skills with converged strength. As an implementation, we score the degree of strength convergence with regret, and guide the skill discovery with a learnable skill generator. To avoid degeneration, the skill generation comes from an upgradable population of skill generators. We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines on both efficiency and diversity. Moreover, our method achieves 15% zero-shot improvement on high-dimensional environments, compared to existing methods.
AI systems today often struggle to efficiently discover new skills without clear instructions or explicit rewards, which limits their adaptability and usefulness in real-world scenarios. Our research addresses this by introducing Regret-aware Skill Discovery (RSD), a novel method inspired by the idea of learning from past mistakes—or "regrets." In RSD, the AI system actively identifies and practices skills where it previously performed poorly, rather than randomly exploring or trying to maximize all information equally. By deliberately targeting weaker skills, the system rapidly improves its overall performance and skill variety. Through extensive testing, we found that RSD not only learns faster and more efficiently, but also enables the AI to perform well in new situations it has never encountered before. This approach can significantly enhance the practical applications of AI by making them more versatile and effective.