Towards Safe and Generalizable Treatment Strategies in Healthcare via RL and PAC-Bayesian Computations

Abdelkrim ZITOUNI; Mehdi hennequin; Juba Agoun; Ryan Horache; Omar Rivasplata; NADIA KABACHI

Towards Safe and Generalizable Treatment Strategies in Healthcare via RL and PAC-Bayesian Computations

Abdelkrim ZITOUNI, Mehdi hennequin, Juba Agoun, Ryan Horache, Omar Rivasplata, NADIA KABACHI

12 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, PAC-Bayes, Healthcare

TL;DR: Reinforcement learning (RL) could personalize treatments but lacks reliable generalization guarantee which are critical in healthcare.

Abstract: Reinforcement learning (RL) offers a promising paradigm for optimizing treatment strategies that adapt over time to patient responses. However, the deployment of RL in clinical settings is hindered by the lack of generalization guarantees, an especially critical concern given the high-stakes nature of this domain. Existing generalization bounds for sequence data are either vacuous or rely on relaxations of the independence condition, which often produce non-sharp bounds and limit their applicability to RL. In this work, we derive a novel PAC-Bayesian generalization bound for RL that explicitly accounts for temporal dependencies arising from Markovian data. Our key technical contribution integrates a bounded-differences condition on the negative empirical return to establish the applicability of a McDiarmid-style concentration inequality tailored to dependent sequences such as Markov Decision Processes. This leads to a PAC-Bayes bound with explicit dependence on the Markov chain’s mixing time. We show that our bound can be directly applied to off-policy RL algorithms in continuous control settings, such as Soft Actor-Critic. Empirically, we demonstrate that our bound yields meaningful confidence certificates for treatment policies in simulated healthcare environments, providing high-probability guarantees on policy performance. Our framework equips practitioners with a tool to assess whether an RL-based intervention meets predefined safety thresholds. Furthermore, by closing the gap between learning theory and clinical applicability, this work advances the development of reliable RL systems for sensitive domains such as personalized healthcare.

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 27635

Loading