Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

Yingchuan Sun; Shengpu Tang

Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

Yingchuan Sun, Shengpu Tang

Published: 23 Sept 2025, Last Modified: 18 Oct 2025TS4H NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: time-step discretization, offline RL, healthcare, time series

Abstract: Existing studies on reinforcement learning (RL) for sepsis management have mostly aggregated patient data into 4-hour time steps. Although this coarseness may distort patient dynamics and lead to suboptimal policies, the extent to which this is a problem in practice remains unexplored. In this work, we conducted controlled experiments of four time-step sizes ($\Delta t\=1,2,4,8$ h), following an identical offline RL pipeline to quantify effects on state representation learning, behavior cloning, policy training, and off-policy evaluation. Under our model-selection criteria, 1 h time-step size yielded the highest estimated returns; however, we caution that this naive comparison is not ``fair'' because the evaluation makes different assumptions about the underlying problem. Our work highlights that time-step size is a core design choice in offline RL for healthcare and emphasizes the importance of thoughtful evaluation.

Submission Number: 114

Loading