- TL;DR: Popular algorithms that cast `"RL as Inference" ignore the role of uncertainty and exploration. We highlight the importance of these issues and present a coherent framework for RL and inference that handles them gracefully.
- Abstract: Reinforcement learning (RL) combines a control problem with statistical estimation: the system dynamics are not known to the agent, but can be learned through experience. A recent line of research casts 'RL as inference' and suggests a particular framework to generalize the RL problem as probabilistic inference. Our paper surfaces key shortcomings in that approach, and clarifies the sense in which RL can be coherently cast as an inference problem. In particular, an RL agent must consider the effects of its actions upon future rewards and observations: the exploration-exploitation tradeoff. In all but the most simple settings, the resulting inference is computationally intractable so that practical RL algorithms must resort to approximation. We show that the popular 'RL as inference' approximation can perform poorly in even the simplest settings. Despite this, we demonstrate that with a small modification the RL as inference framework can provably perform well, and we connect the resulting algorithm with Thompson sampling and the recently proposed K-learning algorithm.
- Keywords: Reinforcement learning, Bayesian inference, Exploration