Keywords: unsupervised RL, exploration, zero-shot, epistemic uncertainty, ensemble
Abstract: Zero-shot reinforcement learning is necessary for extracting optimal policies in absence of concrete rewards for fast adaptation to future problem settings.
Forward-backward representations ($FB$) have emerged as a promising method for learning optimal policies in absence of rewards via a factorization of the policy occupancy measure.
However, up until now, $FB$ and many similar zero-shot reinforcement learning algorithms have been decoupled from the exploration problem, generally relying on other exploration algorithms for data collection.
We argue that $FB$ representations should fundamentally be used for exploration in order to learn more efficiently.
With this goal in mind, we design exploration policies that arise naturally from the $FB$ representation that minimize the posterior variance of the $FB$ representation, hence minimizing its epistemic uncertainty.
We empirically demonstrate that such principled exploration strategies improve sample complexity of the $FB$ algorithm considerably in comparison to other exploration methods.
Submission Number: 241
Loading