Self-Consistent Models and Values

Gregory Farquhar; Kate Baumli; Zita Marinho; Angelos Filos; Matteo Hessel; Hado van Hasselt; David Silver

Self-Consistent Models and Values

Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: reinforcement learning, model-based reinforcement learning, planning, value equivalence

Abstract: Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. Models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly \emph{self-consistent}. This lies in contrast to classic planning methods like Dyna, which only update the value function to be consistent with the model. We propose a number of possible self-consistency updates, study them empirically in both the tabular and function approximation settings, and find that with appropriate choices self-consistency can be useful both for policy evaluation and control.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

TL;DR: Maybe we should train models and value functions to be jointly self-consistent.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/self-consistent-models-and-values/code)

11 Replies

Loading