EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning

ICLR 2026 Conference Submission17938 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bayesian RL, epistemic uncertainty, exploration
TL;DR: A Bayesian RL algorithm that leverages epistemic uncertainty for principled exploration, achieving near-optimal guarantees and strong performance in sparse-reward environments.
Abstract: At the boundary between the known and the unknown, an agent inevitably confronts the dilemma of whether to explore or to exploit. Epistemic uncertainty reflects such boundaries, representing systematic uncertainty due to limited knowledge. In this paper, we propose a Bayesian reinforcement learning (RL) algorithm, $\texttt{EUBRL}$, which leverages epistemic guidance to achieve principled exploration. This guidance adaptively reduces per-step regret arising from estimation errors. We establish nearly minimax-optimal regret and sample complexity guarantees for a specific class of priors in infinite-horizon discounted MDPs. Empirically, we evaluate $\texttt{EUBRL}$ on tasks characterized by sparse rewards, long horizons, and stochasticity. Results demonstrate that $\texttt{EUBRL}$ achieves superior sample efficiency, scalability, and consistency.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 17938
Loading