Off by a Beat: Temporal Misalignment in Offline RL for Healthcare

Shengpu Tang; Jiayu Yao; Jenna Wiens; Sonali Parbhoo

Off by a Beat: Temporal Misalignment in Offline RL for Healthcare

Shengpu Tang, Jiayu Yao, Jenna Wiens, Sonali Parbhoo

Published: 17 Jun 2025, Last Modified: 28 Jun 2025RL4RS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: temporal alignment, time series, offline RL, healthcare, information leakage, data preprocessing

TL;DR: We describe a widespread issue of temporal misalignment that can lead to temporal information leakage and propose a simple fix.

Abstract: Reinforcement learning (RL) is typically applied to environments with well-defined discrete timesteps. However, real-world domains like healthcare often involve irregularly sampled time-series data that require preprocessing. After aggregating the data into fixed-length time windows, it is common practice to align each state with the action that occurred within the same window. We argue that this temporal alignment strategy is problematic, as it effectively allows a policy to rely on future information. Using a toy control task, we demonstrate that the default alignment can result in an incorrect transition function and a learned policy that systematically recommends wrong actions. More worrisome, in a case study of RL for sepsis management on the MIMIC-III dataset, we found that different alignment strategies can produce deceptively similar performance for common global metrics but result in different treatment recommendations in nearly half of the patient states. Our findings highlight an underappreciated, yet critical issue when applying RL to these domains. We advocate for a straightforward fix to prevent temporal information leakage by aligning each state with the action in the next window. Given the prevalence of the temporal misalignment issue in existing literature, we urge the community to carefully reconsider the temporal alignment step, especially when working on RL for high-stakes domains like healthcare.

Submission Number: 12

Loading