A Unifying Perspective on Unsupervised Reinforcement Learning Algorithms

Siddhant Agarwal; Caleb Chuck; Harshit Sikchi; Jiaheng Hu; Max Rudolph; Scott Niekum; Peter Stone; Amy Zhang

A Unifying Perspective on Unsupervised Reinforcement Learning Algorithms

Siddhant Agarwal, Caleb Chuck, Harshit Sikchi, Jiaheng Hu, Max Rudolph, Scott Niekum, Peter Stone, Amy Zhang

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Unsupervised Reinforcement Learning, Representation Learning

TL;DR: To unify a wide range of unsupervised RL algorithms: GCRL, Mutual information skills learning, Successor Features, World Models etc., using a successor measure as the unifying framework.

Abstract: Many sequential decision-making domains, from robotics to language agents, are naturally multi-task on the same set of underlying dynamics. Rather than learning a policy for each task separately, unsupervised reinforcement learning (URL) algorithms pretrain without reward, then leverage that pretraining to quickly obtain performant policies for complex tasks. To this end, a wide range of algorithms have been proposed to explicitly or implicitly pretrain a representation that facilitates quickly solving some class of downstream RL problems. Examples include Goal-conditioned RL (GCRL), Mutual Information Skill Learning (MISL), Successor Feature learning (SF), among others. Amid these disparate objectives lies the open problem of selecting the appropriate representation for sequential decision-making in a particular domain. This paper brings a unifying perspective to all these distinct algorithmic frameworks that make use of the sequential data in some way to predict future outcomes. First, we show that these seemingly disjoint algorithms are, in fact, approximating a common intractable representation learning objective under differing assumptions. We illuminate how these methods make use of embeddings that compress equivalent states to tractably optimize the objective. Finally, we show that assumptions governing practical URL methods create a performance-efficiency tradeoff that can help guide algorithm selection.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 24053

Loading