Parameterized projected Bellman operator

Théo Vincent; Alberto Maria Metelli; Jan Peters; Marcello Restelli; Carlo D'Eramo

Parameterized projected Bellman operator

Théo Vincent, Alberto Maria Metelli, Jan Peters, Marcello Restelli, Carlo D'Eramo

Published: 01 Feb 2023, Last Modified: 22 Jun 2025Submitted to ICLR 2023Readers: Everyone

Keywords: reinforcement learning, bellman operator, operator learning, approximate value iteration

TL;DR: A novel reinforcement learning approach that obtains an approximation of the Bellman operator to overcome the limitations of the regular Bellman operator.

Abstract: The Bellman operator is a cornerstone of reinforcement learning, widely used in a plethora of works, from value-based methods to modern actor-critic approaches. In problems with unknown models, the Bellman operator requires transition samples that strongly determine its behavior, as uninformative samples can result in negligible updates or long detours before reaching the fixed point. In this work, we introduce the novel idea of obtaining an approximation of the Bellman operator, which we call projected Bellman operator (PBO). Our PBO is a parametric operator on the parameter space of a given value function. Given the parameters of a value function, PBO outputs the parameters of a new value function and converges to a fixed point in the limit, as a standard Bellman operator. Notably, our PBO can approximate repeated applications of the true Bellman operator at once, as opposed to the sequential nature of the standard Bellman operator. We prove the important consequences of this finding for different classes of problems by analyzing PBO in terms of stability, convergence, and approximation error. Eventually, we propose an approximate value-iteration algorithm to show how PBO can overcome the limitations of classical methods, opening up multiple research directions as a novel paradigm in reinforcement learning.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/parameterized-projected-bellman-operator/code)

12 Replies

Loading