Keywords: temporal alignment, time series, offline RL, healthcare, information leakage, data preprocessing
TL;DR: We describe a widespread issue of temporal misalignment that can lead to temporal information leakage and propose a simple fix.
Abstract: Reinforcement learning (RL) is typically applied to environments with well-defined discrete timesteps. However, real-world domains like healthcare often involve irregularly sampled time-series data that require preprocessing. After aggregating the data into fixed-length time windows, it is common practice to align each state with the action that occurred within the same window. We argue that this temporal alignment strategy is problematic, as it effectively allows a policy to rely on future information. Using a toy control task, we demonstrate that the default alignment can result in an incorrect transition function and a learned policy that systematically recommends wrong actions. More worrisome, in a case study of RL for sepsis management on the MIMIC-III dataset, we found that different alignment strategies can produce deceptively similar performance for common global metrics but result in different treatment recommendations in nearly half of the patient states. Our findings highlight an underappreciated, yet critical issue when applying RL to these domains. We advocate for a straightforward fix to prevent temporal information leakage by aligning each state with the action in the next window. Given the prevalence of the temporal misalignment issue in existing literature, we urge the community to carefully reconsider the temporal alignment step, especially when working on RL for high-stakes domains like healthcare.
Submission Number: 12
Loading