Between Life and Death: Examining Sparse Reward Designs in Healthcare RL

Yuxuan Shi; Matthew Lafrance; Shengpu Tang

Between Life and Death: Examining Sparse Reward Designs in Healthcare RL

Yuxuan Shi, Matthew Lafrance, Shengpu Tang

Published: 17 Jun 2025, Last Modified: 28 Jun 2025RL4RS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, healthcare, medical decision making, sparse rewards, value function, policy learning, reward design, probabilistic interpretation

TL;DR: We prove an equivalence relationship for three sparse reward designs commonly used in healthcare RL, and empirically probe the assumptions required.

Abstract: In reinforcement learning (RL) for healthcare, reward functions often encode clinical endpoints like survival and death. This results in a sparse reward structure with non-zero rewards only at terminal transitions. However, the exact numerical rewards assigned to survival and death vary in existing literature, raising concerns about whether they will end up optimizing for the same objective. In this work, we theoretically and empirically examine three common sparse reward designs: survival-only, death-only, and mixed. We prove that, under the assumptions of terminal-only rewards, guaranteed absorption, and no discounting, the corresponding value functions of the three designs have an equivalence relationship and lead to the same optimal policy. We verify these theoretical results in randomly generated MDPs and demonstrate how relaxing these assumption affect the equivalence relationship. Finally, we consider a more complex grid-world domain in which the assumptions are violated, where we found the survival-only and mixed designs consistently lead to better policies than the death-only design. Our findings provide important initial insights into the choices of sparse reward designs and how they shape policy learning in healthcare RL applications.

Submission Number: 10

Loading