Keywords: Behavioral Foundation Models (BFMs), Zero-shot Reinforcement Learning, Zero-shot RL, Representation Learning, Unsupervised RL
Abstract: Behavioral Foundation Models (BFMs) have seen some success recently in producing agents with the capabilities to adapt to any unknown reward or task. In reality, these methods are only able to produce near-optimal policies for the reward functions that are in the span of some pre-existing state features. Naturally, their efficiency relies heavily on the choice of state features used by them. As a result, these BFMs have used wide-variety of potentially complex objectives to train task spanning features with different inductive properties. With this work, our aim is to examine the question: are these complex representation learning objectives necessary for zero-shot RL? Specifically, we revisit the objective of self-supervised next-state prediction for state feature learning but observe that such an objective is prone to increase in state-feature similarity and subsequently reducing span of reward functions we can represent optimal policies for. We show that by simply maintaining feature diversity using orthonormal regularization along with next-state prediction, we can match or surpass state-of-the-art complex representation learning methods for zero-shot RL.
Submission Number: 23
Loading