Keywords: Clinical Time Series, Reinforcement Learning, Interpretability, Decision Tree Distillation, Deep Q-Learning
TL;DR: We introduce a two-phase framework that trains deep RL policies on sepsis time series and distills them into compact decision trees, achieving near-perfect fidelity with clinically intuitive interpretability.
Abstract: Sepsis is a complex and life-threatening condition requiring individualized, time-sensitive interventions. Reinforcement learning (RL) has shown promise in optimizing sepsis care, but real-world adoption is hindered by the opacity of its decision-making processes. We propose a novel two-phase framework that couples deep Q-learning with interpretability via decision tree distillation. Phase I trains deep Q-networks (DQNs) on clinical time series trajectories, exploring ensemble methods and behavior cloning (BC) regularization for improved robustness. Phase II distills the learned policies into shallow, human-readable decision trees using greedy, probabilistic, and Q-regression approaches. Our results show increased clinician agreement from 0.231 (baseline) to 0.906 (BC-DQN) without degrading policy value, while our distilled trees retain near-perfect fidelity ($\geq 0.998$), improving transparency. This framework can help bridge the trust gap between ''black-box'' medical AI and interpretable support from high-dimensional time series.
Submission Number: 107
Loading