Illuminating the Unseen: A Large-Scale Exploration of Bias in ICU Discharge Summaries via Language Models

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.
Keywords: Discharge Summaries, Large Language Model, Medical Biases
Abstract: Discharge summaries (DS) are pivotal to patient care transitions, yet their completeness and quality can vary widely. This study uses large language models (LLMs) to evaluate DS quality across more than 50,000 ICU discharges from the MIMIC-IV dataset, aiming to quantify compliance with standardized documentation criteria and identify potential biases among different demographic subgroups. In this study, we adopted 19 established clinical metrics, grouped into five major DS components (e.g., Reason for Hospitalization, Significant Findings, Procedures and Treatment, Patient’s Discharge Condition, and Patient and Family Instructions). Each DS was automatically annotated via LLM-based prompt engineering, producing categorical labels (Fully, Partial, Unacceptance, Missing). We then conducted numeric score-based and level-wise statistical analyses to detect variations in DS quality across race, insurance type, chief complaints, and admission types. For the results, while Reason for Hospitalization was generally well documented, up to 10\% of DSs lacked sufficient Patient and Family Instructions and 3-10% had incomplete Discharge Condition details. Statistically significant disparities (p < 0.05) were observed among subgroups, with higher negative scores (i.e., Missing or Unacceptable) in certain demographic categories, notably Asian males insured under less common plans (not in Medicare or Medicaid), where over 7% of DSs contained deficiencies—more than twice the overall average.
Track: 4. Clinical Informatics
Registration Id: 55NCNHYNQ3J
Submission Number: 4
Loading