Epistemic Bellman Operators

Pascal R. Van der Vaart; Matthijs T. J. Spaan; Neil Yorke-Smith

Epistemic Bellman Operators

Pascal R. Van der Vaart, Matthijs T. J. Spaan, Neil Yorke-Smith

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, uncertainty, bellman operator, proximal policy optimization

TL;DR: Introduce a Bellman operator over distributions, and modify PPO to use uncertainty

Abstract: Uncertainty quantification remains a difficult challenge in reinforcement learning. Several algorithms exist that successfully quantify uncertainty in a practical setting, however it is unclear whether these algorithms are theoretically sound and can be expected to converge. Furthermore, they seem to treat the uncertainty in the target parameters in different ways. In this work, we unify several practical algorithms into one theoretical framework by defining a new Bellman operator on distributions, and show that this Bellman operator is a contraction. Further, building on our theory, we modify PPO, a popular modern model-free algorithm, into an uncertainty-aware variant to showcase the general applicability of our main result.

Submission Number: 36

Loading