Epistemically-guided forward-backward exploration

Núria Armengol Urpí; Marin Vlastelica; Georg Martius; Stelian Coros

Epistemically-guided forward-backward exploration

Núria Armengol Urpí, Marin Vlastelica, Georg Martius, Stelian Coros

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: unsupervised RL, exploration, zero-shot, epistemic uncertainty, ensemble

Abstract: Zero-shot reinforcement learning is necessary for extracting optimal policies in absence of concrete rewards for fast adaptation to future problem settings. Forward-backward representations (FB) have emerged as a promising method for learning optimal policies in absence of rewards via a factorization of the policy occupancy measure. However, up until now, FB and many similar zero-shot reinforcement learning algorithms have been decoupled from the exploration problem, generally relying on other exploration algorithms for data collection. We argue that FB representations should fundamentally be used for exploration in order to learn more efficiently. With this goal in mind, we design exploration policies that arise naturally from the FB representation that minimize the posterior variance of the FB representation, hence minimizing its epistemic uncertainty. We empirically demonstrate that such principled exploration strategies improve sample complexity of the FB algorithm considerably in comparison to other exploration methods.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Núria_Armengol_Urpí1

Track: Fast Track: published work

Publication Link: https://openreview.net/forum?id=rJ0QbzBTRF

Submission Number: 127

Loading