A Distributional Analogue to the Successor Representation

Harley Wiltzer; Jesse Farebrother; Arthur Gretton; Yunhao Tang; Andre Barreto; Will Dabney; Marc G Bellemare; Mark Rowland

A Distributional Analogue to the Successor Representation

Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, distributional reinforcement learning, risk-aware, successor representation, successor measure

TL;DR: We lift the successor representation to distributional RL, which enables zero-shot distributional policy evaluation.

Abstract: This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.

Already Accepted Paper At Another Venue: already accepted somewhere else

Submission Number: 100

Loading