From Black Box to Bedside: Distilling Reinforcement Learning for Interpretable Sepsis Treatment

Ella Lan; Andrea Yu; Sergio Charles

From Black Box to Bedside: Distilling Reinforcement Learning for Interpretable Sepsis Treatment

Ella Lan, Andrea Yu, Sergio Charles

Published: 23 Sept 2025, Last Modified: 17 Feb 2026CogInterp @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Interpretability, Decision Tree Distillation, Sepsis Treatment Optimization, Deep Q-Learning

TL;DR: We introduce a novel two-phase framework that trains deep RL sepsis treatment policies and distills them into compact decision trees, achieving high fidelity with strong, clinically intuitive interpretability.

Abstract: Sepsis is a complex and life-threatening condition requiring individualized, time-sensitive interventions. Reinforcement learning (RL) has shown promise for optimizing sepsis care, but real-world adoption is hindered by the opacity of its decision-making processes. We propose a novel two-phase framework that couples deep Q-learning with post hoc interpretability via decision tree distillation. Phase I trains deep Q-networks (DQNs) on MIMIC-III ICU trajectories, exploring ensemble methods and behavior cloning (BC) regularization for improved robustness and clinician agreement. Phase II distills the learned policies into shallow, human-readable decision trees using greedy, probabilistic, and Q-regression approaches. Our results show increased clinician agreement from 0.231 (baseline) to 0.906 (BC-DQN), without degrading policy value, while our distilled trees retain near-perfect fidelity ($\geq 0.998$), improving transparency. This framework can help bridge the trust gap between ``black-box'' medical AI and interpretable clinical technologies.

Submission Number: 52

Loading