What If The Patient Were Different? A Framework To Audit Biases and Toxicity in LLM Clinical Note Generation

What If The Patient Were Different? A Framework To Audit Biases and Toxicity in LLM Clinical Note Generation

ACL ARR 2025 May Submission4599 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: After each patient encounter, physicians compile extensive, semi-structured clinical summaries known as SOAP notes. These notes, while essential for both clinical practice and research, are time-consuming to generate in a digital format, contributing significantly to physician burnout. Recently, Large Language Models (LLMs) have shown promising abilities in automating the generation of clinical notes. Despite these advancements, there is a risk that such models could inadvertently cause harm and worsen existing health disparities. It is crucial to systematically evaluate models to ensure the development of clinical documentation tools that uphold principles of health equity. We introduce the \emph{first} comprehensive framework to assess equity-related harms in LLM-generated, long-form clinical notes. Extensive empirical analysis reveals notable disparities in model-generated content across patient demographics. Our work aims to establish a foundation for ensuring that automated clinical documentation tools are not only efficient but also equitable in their impact on diverse patient populations.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Bias, Fairness and Toxicity in LLMs; Bias in LLM Clinical Applications; Clinical Note Generation

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 4599

Loading