Beyond Average Value Function in Precision Medicine: Maximum Probability-Driven Reinforcement Learning for Survival Analysis

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning; Survival Analysis; Recurrent Events
TL;DR: A novel RL objective maximizing probabilities for recurrent event data is proposed, achieving lower variance and faster convergence compared to traditional methods.
Abstract: Constructing multistage optimal decisions for alternating recurrent event data is critically important in medical and healthcare research. Current reinforcement learning (RL) algorithms have only been applied to time-to-event data, with the objective of maximizing expected survival time. However, alternating recurrent event data has a different structure, which motivates us to model the probability and frequency of event occurrences rather than a single terminal outcome. In this paper, we introduce an RL framework specifically designed for alternating recurrent event data. Our goal is to maximize the probability that the duration between consecutive events exceeds a clinically meaningful threshold. To achieve this, we identify a lower bound of this probability, which transforms the problem into maximizing a cumulative sum of log probabilities, thus enabling direct application of standard RL algorithms. We establish the theoretical properties of the resulting optimal policy and demonstrate through numerical experiments that our proposed algorithm yields a larger probability of that the time between events exceeds a critical threshold compared with existing state-of-the-art algorithms.
Supplementary Material: zip
Primary Area: General machine learning (supervised, unsupervised, online, active, etc.)
Submission Number: 15207
Loading