Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

Published: 23 Sept 2025, Last Modified: 18 Oct 2025TS4H NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: time-step discretization, offline RL, healthcare, time series
Abstract: Existing studies on reinforcement learning (RL) for sepsis management have mostly aggregated patient data into 4-hour time steps. Although this coarseness may distort patient dynamics and lead to suboptimal policies, the extent to which this is a problem in practice remains unexplored. In this work, we conducted controlled experiments of four time-step sizes ($\Delta t\=1,2,4,8$ h), following an identical offline RL pipeline to quantify effects on state representation learning, behavior cloning, policy training, and off-policy evaluation. Under our model-selection criteria, 1 h time-step size yielded the highest estimated returns; however, we caution that this naive comparison is not ``fair'' because the evaluation makes different assumptions about the underlying problem. Our work highlights that time-step size is a core design choice in offline RL for healthcare and emphasizes the importance of thoughtful evaluation.
Submission Number: 114
Loading