Keywords: Unsupervised Reinforcement Learning, Representation Learning
TL;DR: To unify a wide range of unsupervised RL algorithms: GCRL, Mutual information skills learning, Successor Features, World Models etc., using a successor measure as the unifying framework.
Abstract: Many sequential decision-making domains, from robotics to language agents, are naturally multi-task on the same set of underlying dynamics. Rather than learning a policy for each task separately, unsupervised reinforcement learning (URL) algorithms pretrain without reward, then leverage that pretraining to quickly obtain performant policies for complex tasks. To this end, a wide range of algorithms have been proposed to explicitly or implicitly pretrain a representation that facilitates quickly solving some class of downstream RL problems. Examples include Goal-conditioned RL (GCRL), Mutual Information Skill Learning (MISL), Successor Feature learning (SF), among others. Amid these disparate objectives lies the open problem of selecting the appropriate representation for sequential decision-making in a particular domain. This paper brings a unifying perspective to all these distinct algorithmic frameworks that make use of the sequential data in some way to predict future outcomes. First, we show that these seemingly disjoint algorithms are, in fact, approximating a common intractable representation learning objective under differing assumptions. We illuminate how these methods make use of embeddings that compress equivalent states to tractably optimize the objective. Finally, we show that assumptions governing practical URL methods create a performance-efficiency tradeoff that can help guide algorithm selection.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 24053
Loading