The Reward Hypothesis is False

Joar Max Viktor Skalse; Alessandro Abate

The Reward Hypothesis is False

Joar Max Viktor Skalse, Alessandro Abate

Published: 05 Dec 2022, Last Modified: 05 May 2023MLSW2022Readers: Everyone

Abstract: The \emph{reward hypothesis} is the hypothesis that \enquote{all of what we mean by goals and purposes can be well thought of as the maximisation of the expected value of the cumulative sum of a received scalar signal}\citep{sutton2018reinforcement}. In this paper, we will argue that this hypothesis is false. We will look at three natural classes of reinforcement learning tasks (multi-objective reinforcement learning, risk-averse reinforcement learning, and modal reinforcement learning), and then prove mathematically that these tasks cannot be expressed using any scalar, Markovian reward function. We thus disprove the reward hypothesis by providing many examples of tasks which are both natural and intuitive to describe, but which are nonetheless impossible to express using reward functions. In the process, we provide necessary and sufficient conditions for when a multi-objective reinforcement learning problem can be reduced to ordinary, scalar reward reinforcement learning. We also call attention to a new class of reinforcement learning problems (namely those we call \enquote{modal} problems), which have so far not been given any systematic treatment in the reinforcement learning literature.

1 Reply

Loading