A Probabilistic Perspective on Reinforcement Learning via Supervised Learning

Alexandre Piché; Rafael Pardinas; David Vazquez; Christopher Pal

A Probabilistic Perspective on Reinforcement Learning via Supervised Learning

Alexandre Piché, Rafael Pardinas, David Vazquez, Christopher Pal

Published: 27 Apr 2022, Last Modified: 05 May 2023ICLR 2022 GPL PosterReaders: Everyone

Keywords: offline rl, control as inference, rl via supervised learning, probabilistic inference

TL;DR: A Probabilistic Perspective on Reinforcement Learning via Supervised Learning algorithms

Abstract: Reinforcement Learning via Supervised Learning (RvS) only uses supervised techniques to learn desirable behaviors from large datasets. RvS has attracted much attention lately due to its simplicity and ability to leverage diverse trajectories. We introduce Density to Decision (D2D), a new framework, to unify a myriad of RvS algorithms. The Density to Decision framework formulates RvS as a two-step process: i) density estimation via supervised learning and ii) decision making via exponential tilting of the density. Using our framework, we categorise popular RvS algorithms and show how they are different by the design choices in their implementation. We then introduce a novel algorithm, Implicit RvS, leveraging powerful density estimation techniques that can easily be tilted to produce desirable behaviors. We compare the performance of a suite of RvS algorithms on the D4RL benchmark. Finally, we highlight the limitations of current RvS algorithms in comparison with traditional RL ones.

1 Reply

Loading