A Unified Framework for Unsupervised Reinforcement Learning Algorithms

Siddhant Agarwal; Caleb Chuck; Harshit Sikchi; Jiaheng Hu; Max Rudolph; Scott Niekum; Peter Stone; Amy Zhang

A Unified Framework for Unsupervised Reinforcement Learning Algorithms

Siddhant Agarwal, Caleb Chuck, Harshit Sikchi, Jiaheng Hu, Max Rudolph, Scott Niekum, Peter Stone, Amy Zhang

Published: 01 Jul 2025, Last Modified: 18 Jul 2025RLBrew: Ingredients for Developing Generalist Agents workshop (RLC 2025)EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Unsupervised Reinforcement Learning, Representation Learning, Reinforcement Learning Theory

TL;DR: To unify a wide range of unsupervised RL algorithms: GCRL, Mutual information skills, Forward-backward etc., this work presents the successor measure and state equivalence as the shared objective and compression method respectively.

Abstract:

Many sequential decision-making domains, from robotics to language agents, are naturally multi-task on the same set of underlying dynamics. Rather than learning each task separately, unsupervised reinforcement learning (RL) algorithms pretrain without reward, then leverage that pretraining to quickly obtain optimal policies for complex tasks. To this end, a wide range of algorithms have been proposed to explicitly or implicitly pretrain a representation that facilitates quickly solving some class of downstream RL problems. Examples include Goal-conditioned RL (GCRL), Mutual Information Skill Learning (MISL), forward-backward representation learning (FB) and controllability representations. This paper brings together all these heretofore distinct algorithmic frameworks into a unified view. First, we show that these algorithms are, in fact, approximating the same intractable representation learning objective, the successor measure or discounted future policy-dependent state-action distribution, under different assumptions. We then illustrate that to make these methods tractable, practical applications of these algorithms utilize embeddings that can be described under the framework of state equivalences. Through this work, we highlight shared underlying properties that characterize core problems in Unsupervised RL.

Submission Number: 21

Loading