Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

Published: 13 Jun 2025, Last Modified: 28 Jun 2025RL4RS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline Reinforcement Learning, Healthcare, Sepsis Treatment, Time-Step Discretization, Off-Policy Evaluation
Abstract: Existing studies on reinforcement learning (RL) for sepsis management have mostly followed an established problem setup, in which patient data were aggregated into 4-hour time steps. Although concerns have been raised regarding the coarseness of this time-step size, which might distort patient dynamics and lead to suboptimal treatment policies, the extent to which this happens in practice remains unexplored. In this work, we conducted empirical experiments for a controlled comparison of four time-step sizes ($\Delta t=1,2,4,8\ \text{h}$) on this task, following a consistent offline RL pipeline. Our goal was to quantify how time-step size influences state representation learning, model selection, and off-policy evaluation. Our results show that smaller time-step sizes (1 h and 2 h) yielded higher estimated returns than the canonical 4 h setting without reducing the effective sample size (ESS), however this is influenced by how importance ratios are truncated during evaluation. In addition, we found that tailoring the action space definition to the distribution treatments under each time-step size led to improved policy performance. Our work highlights that time-step size and action‐space definition are core design choices that shape policy learning for sepsis treatment.
Submission Number: 20
Loading