Epistemically-guided forward-backward exploration

Published: 09 May 2025, Last Modified: 28 May 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: unsupervised RL, exploration, zero-shot, epistemic uncertainty, ensemble
Abstract: Zero-shot reinforcement learning is necessary for extracting optimal policies in absence of concrete rewards for fast adaptation to future problem settings. Forward-backward representations ($FB$) have emerged as a promising method for learning optimal policies in absence of rewards via a factorization of the policy occupancy measure. However, up until now, $FB$ and many similar zero-shot reinforcement learning algorithms have been decoupled from the exploration problem, generally relying on other exploration algorithms for data collection. We argue that $FB$ representations should fundamentally be used for exploration in order to learn more efficiently. With this goal in mind, we design exploration policies that arise naturally from the $FB$ representation that minimize the posterior variance of the $FB$ representation, hence minimizing its epistemic uncertainty. We empirically demonstrate that such principled exploration strategies improve sample complexity of the $FB$ algorithm considerably in comparison to other exploration methods.
Submission Number: 241
Loading