A Distributional Analogue to the Successor Representation

Jesse Farebrother; Harley Wiltzer; Arthur Gretton; Yunhao Tang; Andre Barreto; Will Dabney; Marc G Bellemare; Mark Rowland

A Distributional Analogue to the Successor Representation

Jesse Farebrother, Harley Wiltzer, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: reinforcement learning, distributional reinforcement learning, successor representation, successor measure, geometric horizon models, gamma models, risk-aware

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: Extending the successor representation to distributional RL, which enables zero-shot distributional policy evaluation.

Abstract: This paper contributes a new approach for distributional reinforcement learning which allows for a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences from behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We model the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Extending γ-models (Janner et al., 2020), we propose an algorithm that learns the distributional SM from samples by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable in the context of learning generative models of state. As an illustration of the practical usefulness of the distributional successor measure, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7962

Loading